How to match end of string in lookahead? - c#

Why this do not match and how to make it work?
Regex.Match("qwe", ".*?(?=([ $]))");
I should match everything to first space or to the end of line.

Your specific problem is that you need to use an alternation, not a character class, because inside a character class the $ symbol literally means "match a dollar symbol", and does not have its special meaning end-of-line in that context.
( |$)
It seems however that your example is a bit strange. It would be simpler to match any character except space, then you wouldn't need a lookahead at all.

Try with:
Regex.Match("qwe", "^([^ ]*)");

Related

How to match string by using regular expression which will not allow same special character at same time?

I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)
You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1
The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo
Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.

Variable-length lookbehind for backslashes

What seemed to be a simple task ended up to not work as expected...
I'm trying to match \$\w+\b, unless it's preceded by an uneven number of backslashes.
Examples (only $result should be in the match):
This $result should be matched
This \$result should not be matched
This \\$result should be matched
This \\\$result should not be matched
etc...
The following pattern works:
(?<!\\)(\\\\)*\$\w+\b
However, even repeats of backslashes are included in the match, which is unwanted, so I'm trying to achieve this purely with a variable-length lookbehind, but nothing I tried so far seems to work.
Any regex virtuoso here can lend a hand?
You may use the following pattern:
(?<!(?:^|[^\\])\\(?:\\\\)*)\$\w+\b
Demo.
Breakdown of the Lookbehind; i.e., not preceded by:
(?:^|[^\\]) - Beginning of string/line or any character other than backslash.
\\ - Then, one backslash character.
(?:\\\\)* Then, any even number of backslash characters (including zero).
Looks like asking the question helped me answer my own question.
The part I don't want to be matched has to be wrapped with a positive lookbehind.
(?<=(?<!\\)(\\\\)*)\$\w+\b
Also works if the $result is at the start of the line.
If anyone has more optimal solutions, shoot!
This regular expression gets the wanted text in the third capture group:
(^| )(\\\\)*(\$\w+\b)
Explanation:
(^| ) Either beginning of line or a space
(\\\\)* An even number of backslash characters, including none
( Start of capture group 3
\$\w+\b The wanted text
) End of capture group 3

Can a regex '(?!^()$)' match anything?

Problem
In a very special case, my negative lookahead is an empty list:
(?!^()$)
Is there any string that matches it?
Clarification
Let's say:
(?!^()$)^(.*)$
Will it match everything?
(?!^()$) can be simplified to (?!^$) since () is a null group and will match at any position, all the time.
So now you're saying "match at any and every position where the start and end anchors aren't right next to one another, or in other words, we aren't at an empty string".
Therefore (?!^$) can match at every position in a string that isn't just empty or a newline.
(?!^()$)^(.*)$ is "match everywhere but at empty string" plus ^.*$ which will "match at and consume every single line, empty or not" (anchors ^ and $ have no effect in this case). So it's essentially saying "consume (at least) one or more characters in a string", which can be distilled down to simply .+
Literally anything, beside empty string.
The regex contains 2 parts, (?!^()$) and ^(.*)$ :
(?!^()$) Is a negative zero-width match for empty string. In order words, string.Empty is out.
^(.*)$ is a full match for anything except newlines1 repeated 0 to many times, so basically anything.
Note : 1. exception new line character

C# Regex explicitly match string

I want to match only words from A-Z and a-z with an optional period at the end. This is the code I have so far:
return Regex.IsMatch(word, #"[A-Za-z]\.?")
This means the following should return true: test, test..
The following should return false: t3st, test.., ., .
Right now this regex returns true for everything.
Try this regex:
#"^[A-Za-z]+\.?$"
Boundary matchers
^ means beginning of a line
$ means end of a line
Greedy quantifier
[A-Za-z]+ means [A-Za-z], one or more times
Your regex only asks for a single letter followed by an optional period. Since all your words start with a letter, that regex returns true.
Notice the + in Prince John Wesley's answer - that says to use one or more letters, so it'll "eat" all the letters in the word, stopping at a non-letter. Then the \.? tests for an optional period.
I don't think the ^ and $ are needed in this case.
You might also want to include the hyphen and the single-quote in your regex; otherwise you'll have problems with hyphenated words and contractions.
That may get messy; some words have more than one hyphen: "port-of-call" or "coat-of-arms", for example. Very rarely you'll find a word with more than one quote: "I'd've" for "I would have". They're rare enough you can probably forget about 'em. (oops! there's a word that starts with a quote... :)

RegularExpressionValidator for TextBox

I had a question on here for a RegularExpressionValidator which I'm relatively new to. It was to accept all alphanumeric, apostrophe, hyphen, underscore, space, ampersand, comma, parentheses, full stop.
The answer I was given was:
"^([a-zA-Z0-9 '-_&,()\.])+$"
This seemed good at first but it seems to accept amoung other things '*'.
Can anybody tell me what I have wrong here?
The problem appears to be the dash - inside a character class, if unescaped and not at the very end or very beginning of the character class, it denotes a range (A-Z would be a good example from your own regex).
Therefore '-_ is also interpreted as a range, and the characters between ASCII 39 (') and ASCII 95 (_) are ()*+,-./0-9:;<=>?#A-Z[\]^.
Put the dash at the end, and you should be fine:
^[a-zA-Z0-9 '_&,().-]+$
Your character class is not quite correct. This part: '-_ creates a range from the apostrophe character to the underscore character. In the ASCII table, the * character falls in between. You need to either escape the hyphen:
^([a-zA-Z0-9 '\-_&,()\.])+$
Or move it somewhere "insignificant", such as the end of the character class:
^([a-zA-Z0-9 '_&,()\.-])+$
In addition to the '-_ issue touched on by other people you also have the + on the end in the wrong place.
The value capture group in this regex:
^([a-zA-Z0-9 '-_&,()\.])+$
in Expresso is the last character in the string.
If you want to capture the whole thing inside the regex then put the + straight after the ] like
^([a-zA-Z0-9 '-_&,()\.]+)$
If you are not bothered about extracting the value captured inside the ( ) then drop the ()
^[a-zA-Z0-9 '-_&,()\.]+$
As I also tripped up on the fact that this uses a character class in my initial answer, I dug around for more info. Found the following tutorial excerpt at http://www.regular-expressions.info/charclass.html
The only special characters or
metacharacters inside a character
class are the closing bracket (]), the
backslash (), the caret (^) and the
hyphen (-). The usual metacharacters
are normal characters inside a
character class, and do not need to be
escaped by a backslash.
Escaping the - with \- should solve your problem.

Categories

Resources