Match <keyword> with whitespace at end/start of line - c#

I can't figure out how to get a C# regex IsMatch to match a <keyword> followed by an end of line or whitespace.
I currently have [\s]+keyword[\s]+ which works for spaces, but does not work for keyword<end of string> or <start of string>keyword.
I have tried [\s^]+keyword[\s$]+, but this makes it fail to match with the spaces, and doesn't work at the end or start of a string.
Here's the code I tried:
string pattern = string.Format("[\\s^]+{0}[\\s$]+",keyword);
if(Regex.IsMatch(Text, pattern, RegexOptions.IgnoreCase))

The problem is that ^ and $ inside character classes are not treated as anchors but as literal characters. You could simply use alternation instead of a character class:
string pattern = string.Format(#"(?:\s|^){0}(?:\s|$)",keyword);
Note that there is no need for the +, because you just want to make sure if there is one space. You don't care if there are more of them. The ?: is just good practice and suppresses capturing which you don't need here. And the # makes the string a verbatim string, where you don't have to double-escape your backslashes.
There is another way, which I find slightly neater. You can use lookarounds, to ensure that there is not a non-space character to left and right of your keyword (yes, double negation, think about it). This assumption is valid if there is a space-character or if there is one end of the string:
string pattern = string.Format(#"(?<!\S){0}(?!\S)",keyword);
This does exactly the same, but might be slightly more efficient (you'd have to profile that to be certain, though - if it even matters).
You can also use the first pattern (with non-inverted logic) with (positive) lookarounds:
string pattern = string.Format(#"(?<=\s|^){0}(?=\s|$)",keyword);
However, this doesn't really make a difference to the first pattern, unless you want to find multiple matches in a string.
By the way, if your keyword might contain regex meta-characters (like |, $, + and so on), make sure to escape it first using Regex.Escape

I am not exactly sure what you are really trying to accomplish with this regex but the following code will match the the string 'keyword' when it has white space on either side of it:
string resultString = null;
try {
Regex regexObj = new Regex(#"\b(keyword)\b");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
It can be generally explained as: the \b asserts the position at the beginning and end word boundaries. In this case I assumed the word of interest was keyword.
I also thought from my interpretation of your question that you might be interested in matching the entire series of characters that follow the keyword up to the line break. If that is the case the following regex code will return that match:
string resultString = null;
try {
Regex regexObj = new Regex(#"\bkeyword\b(\w*\s*)$");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
This regular expression can be interpreted as find the beginning and ending word boundaries which is the reason for the \b on either side. The (\w*\s*)$ reads like this match all word \w characters and space characters \s* as many times as they occur and move position to the end of the line $.
This next bit of code will read in the entire line of data that contains the keyword, lines of data that do not contain the keyword will not match.
string resultString = null;
try {
Regex regexObj = new Regex("^.*keyword.*$");
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Explained: the ^ positions at the beginning of the string, the .* matches any character that is not a line break character, the keyword is then included followed by the .* so the remaining non line breaking characters are included and the $ asserts the position at the end of the string which would be the entire line in this example.
I hope the above is helpful if not this time maybe in the future. I am always trying to discover alternative practices to achieve the same results, so if you have any constructive criticism please post it.
Best wishes,
Steve

Try this:
string pattern = string.Format("^\\s*{0}\\s*$",keyword);

i found this other post
How to specify "Space or end of string" and "space or start of string"?
and that answered the question
so my code is now
string pattern = string.Format("\\b+{0}\\b+",keyword);
if(Regex.IsMatch(UserText, pattern, RegexOptions.IgnoreCase))

Related

Regex matching the optional substring of "1-1" [duplicate]

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

C# Regex matching words with a space in between [duplicate]

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

Matching optional string with Regex in .NET [duplicate]

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

C# regex to match words containing known substrings and not equal to specific keywords

I need to verify if a string contains "error" or "exception" in it, excluding certain keywords: "exception1", "exception2", "includeException", "error1".
This regex seems to do the job:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)(exception|error)\w*\b
It correctly returns 2 matches when run against the following string:
Test string: "exception1 exception2 exception3 includeException error1 error2"
Matches: "exception3", "error2"
However, if I set the RegexOptions.IgnoreCase flag or add "(?i)" at the beginning of the Regex it also returns a match for "includeException".
What am I missing here?
Using a good Regex tester can help you figure out what's actually being matched. I used this one:
http://regexhero.net/tester/
In the results where it highlights the matches, there is a small button with an 'i' for information. So the reason that it's matching innerException when it's case insensitive is because you are matching the latter half of the word. The regex doesn't require white space separating the words.
Your regex would match with case invariant off if innerException were written as innerexception because your positive match (exception|error) is matching the last half. You can also see that when you start removing spaces. exception1exception2 doesn't match, but exception1exception2exception3 does.
While Regex is very compact, there are several ways to get it wrong. A straightforward approach might be a better solution in this case.
Changing your regex to remove the last wildcard * characters will make what you have work the way you want:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)(exception|error)\w\b
I see two main bottlenecks with your regex:
It has several unanchored lookaheads (when unanchored, they usually do not help unless used in a tempered greedy token and other complex patterns)
The \w* subpatterns are placed on both sides of lookaheads, thus, removing any impact from the lookaheads.
The problem with case-insensitivity is described in Berin's answer, you want to match the word exception and includeException contains that substring. So, a possible solution is to add a leading word boundary to (error|exception) pattern:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)\b(exception|error)\w*\b
^^
However, if you need to match words containing error or exception, that ARE NOT EQUAL to specific keywords, use
\b(?!(?:exception1|exception2|includeException|error1)\b)\w*(exception|error)\w*\b
Here, the lookaheads are anchored to the leading word boundary, they are only checked once after each word boundary, not at each position inside a word. Certainly, you can contract it further: \b(?!(?:exception[12]|includeException|error1)\b)\w*(exception|error)\w*\b.
Now, if you need to match words containing error or exception, that DO NOT CONTAIN specific keywords, use
\b(?!\w*(?:exception1|exception2|includeException|error1))\w*(exception|error)\w*\b
All regex patterns used here are tested at regexhero.net
Regex is not very readable... how about a pure C# solution?
public static Boolean ContainsErrorOrExceptionExcept(this string input, string[] excludedKeywords)
{
if (input.Contains("error") || input.Contains("exception"))
{
foreach (string x in excludedKeywords)
{
if (input.Contains(x))
{
return false;
}
}
return true;
}
else
{
return false;
}
}

Removing Sub-string with some pattern from a string

I have a string something like JSON format:
XYZ DIV Parameters: width=\"1280\" height=\"720\", session=\"1\"
Now I want to remove width=\"1280\" height=\"720\" from this string.
Note: There can be any number in place of 1280 and 720. So, I can't just replace it with null.
Please tell me how to solve it? Either by Regex or any other better method possible.
Regex to be replaced with empty string:
(width|height)=\\"\d+\\"
Regex visualization:
Code:
string input = #"XYZ DIV Parameters: width=\""1280\"" height=\""720\"", session=\""1\""";
string output = Regex.Replace(input, #"(width|height)=\\""\d+\\""", string.Empty);
You could do a find and replace using the following regex:
width=\\"\d*+\\" replace with a blank string, as well as replacing height=\\"\d*+\\" with a blank string.
This is removing the entire text of width=\"XYZ\", if you wanted to just replace the numbers or blank out the numbers you can replace with a string that suits your needs (width=\"\" for example)
If you can guarantee the width and height will ALWAYS be in that format and ALWAYS follow each other seperated by a space, you can combine that into one bigger regex find/replace using width=\\"\d*+\\" height=\\"\d*+\\".
A little more explanation on the regex so you take something away, not just a quick fix :)
width=\\"\d*+\\" breaks down to:
width= pretty simple, just find the text you are looking for to start your removal.
\\" since \ is a special char in regex you have to escape it, then the " char can just follow it up like normal.
\d*+ digits \d, zero or more of them *, and then non greedy +. The important part here is the non greedy on the digits. If you left that off, your regex would look and consume digits until it found the last ". Not 100% needed in your case (since height is buffering) but it is still a lot safer.
\\" to end the regex out
This will do it:
string resultString = null;
try {
Regex regexObj = new Regex(#"^(.*?)width=\\"".*?\\"" height=\\"".*?\\""(.*?)$", RegexOptions.IgnoreCase);
resultString = regexObj.Replace(subjectString, #"$1width=\""\"" height=\""\""$2");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Categories

Resources