I am trying to use Regex to find out if a string matches *abc - in other words, it starts with anything but finishes with "abc"?
What is the regex expression for this?
I tried *abc but "Regex.Matches" returns true for xxabcd, which is not what I want.
abc$
You need the $ to match the end of the string.
.*abc$
should do.
So you have a few "fish" here, but here's how to fish.
An online expression library and .NET-based tester: RegEx Library
An online Ruby-based tester (faster than the .NET one) Rubular
A windows app for testing exressions (most fully-featured, but no zero-width look-aheads or behind) RegEx Coach
Try this instead:
.*abc$
The $ matches the end of the line.
^.*abc$
Will capture any line ending in abc.
It depends on what exactly you're looking for. If you're trying to match whole lines, like:
a line with words and spacesabc
you could do:
^.*abc$
Where ^ matches the beginning of a line and $ the end.
But if you're matching words in a line, e.g.
trying to match thisabc and thisabc but not thisabcd
You will have to do something like:
\w*abc(?!\w)
This means, match any number of continuous characters, followed by abc and then anything but a character (e.g. whitespace or the end of the line).
If you want a string of 4 characters ending in abc use, /^.abc$/
Related
Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online
I am looking for a way to get words out of a sentence. I am pretty far with the following expression:
\b([a-zA-Z]+?)\b
but there are some occurrences that it counts a word when I want it not to. E.g a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \.? did not do the trick, and variations on this have not yielded anything fruitful either.
Hope someone can help!
A single dot means any character. You must escape it as
\.?
Maybe you want an expression like this:
\w+\.?
or
\p{L}+\.?
You need to add \.? (and not .?) because the period has special meaning in regexes.
to avoid a match on your example "test.." you ask for you not only need to put the \.? for checking first character after the word to be a dot but also look one character further to check the second character after the word.
I did end up with something like this
\w{2,}\.?[^.]
You should also consider that a sentence not always ends with a . but also ! or ? and alike.
I usually use rubulator.com to quick test a regexp
I am updating some code that I didn't write and part of it is a regex as follows:
\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]
I understand that .*? does a non-greedy match of everything in the second register.
What does ?:\s* in the first and third registers do?
Update: As requested, language is C# on .NET 3.5
The syntax (?:) is a way of putting parentheses around a subexpression without separately extracting that part of the string.
The author wanted to match the (.*?) part in the middle, and didn't want the spaces at the beginning or the end from getting in the way. Now you can use \1 or $1 (or whatever the appropriate method is in your particular language) to refer to the domain name, instead of the first chunk of spaces at the beginning of the string
?: makes the parentheses non-grouping. In that regex, you'll only pull out one piece of information, $1, which contains the middle (.*?) expression.
What does ?:\s* in the first and third registers do?
It's matching zero or more whitespace characters, without capturing them.
The regex author intends to allow trailing whitespace in the square-bracket-tags, matching all DNS labels following the "www." like so:
[url]www.foo.com[/url] # foo.com
[url ]www.foo.com[/url ] # same
[url ]www.foo.com[/url] # same
[url]www.foo.com[/url ] # same
Note that the regex also matches:
[url]www.[/url] # empty string!
and fails to match
[url]stackoverflow.com[/url] # no match, bummer
You may find this Regular Expressions Cheat Sheet very helpful (hopefully). I spent ages trying to learn Regex with no luck. And once I read this cheat-sheet - I immediately understood what I previously failed to learn.
http://krijnhoetmer.nl/stuff/regex/cheat-sheet/
This is almost certainly something really silly that I've overlooked but I'm stumped. The following C# style expression is supposed to match phones numbers (a limited subset of them, but this is just testing...):
^[0-9]{3}-[0-9]{3}-[0-9]{4}$
The search string is as follows:
978-454-0586\r\nother junk\r\nmore junk\r\nhttp://www.google.com\r\n
The expression matches the phone number when in isolation, however not when next to other stuff. For example, if I lop off everything after the phone it works just great.
How can I modify the expression so that it matches the phone number and doesn't get hung up on the rest of the junk?
Thanks!
The ^ and $ symbols mean "beginning of line" and "end of line" respectively. Get rid of them if you want to match in the middle of a line.
"$" in a regular expression matches the end of a line. If you remove that, the regexp should work correctly, though if you have "Foo978-454-0586", it won't work, since "^" matches the start of a line.
Are the phone numbers always on their own lines? If so add RegexOptions.Multiline to your Regex constructor. Without that the regex.match is using the beginning and end of the string for ^ and $.
The $ means end of string, not end of line.
The problem is that "^" and "$" forces it to only match on the start of the string and the end of the string.
Remove those two tags and see how you go.
I need to find all matches of word which strictly begins with "$" and contains only digits. So I wrote
[$]\d+
which gave me 4 matches for
$10 $10 $20a a$20
so I thought of using word boundaries using \b:
[$]\d+\b
But it again matched
a$20 for me.
I tried
\b[$]\d+\b
but I failed.
I'm looking for saying, ACCEPT ONLY IF THE WORD STARTS WITH $ and is followed by DIGITS. How do I tell IT STARTS WITH $, because I think \b is making it assume word boundaries which means surrounded inside alphanumeric characters.
What is the solution?
Not the best solution but this should work. (It does with your test case)
(?<=\s+|^)\$\d+\b
Have you tried
\B\$\d+\b
You were close, you just need to escape the $:
\B\$\d+\b
See the example matches here: http://regexhero.net/tester/?id=79d0ac3b-dd2c-4872-abb4-6a9780c91fc1
Try with ^\$\d+
where ^ denoted the beginning of a string.