How can I use a regular expression to get words that start with ! ? For example !Test.
I tried this but it doesn't give any matches:
#"\B\!\d+\b"
Although it did work when I replaced the ! with $.
I'd say that your regex was quite OK already, you just need to use \w (alphanumeric character) instead of \d (digit):
#"\B!\w+\b"
will match any word that is immediately preceded by a ! unless that ! itself is preceded by a word itself (that's what the \B asserts). Using a ^ instead will limit the matches to words that start at the beginning of a line which might not be what you want.
So this will match all the words including exactly one preceding ! in this line:
!hello !this ...!will !!!be !matched!
but none of the words in this line:
this! won't!be matched!!!
You could also drop the \B altogether if you don't mind matching !that in this!that.
This should work: ^!\w+
MatchCollection matches = Regex.Matches (inputText, #"^!\w+");
foreach (Match match in matches)
{
Console.WriteLine (match.Value);
}
Related
I've been trying to extract the word before the match. For example, I have the following sentence:
"Allatoona was a town located in extreme southeastern Bartow County, Georgia."
I want to extract the word before "Bartow".
I've tried the following regex to extract that word:
\w\sCounty,
What I get returned is "w County" when what I wanted is just the word Bartow.
Any assistance would be greatly appreciated. Thanks!
You can use this regex with a lookahead to find word before County:
\w+(?=\s+County)
(?=\s+County) is a positive lookahead that asserts presence of 1 or more whitespaces followed by word County ahead of current match.
RegEx Demo
If you want to avoid lookahead then you can use a capture group:
(\w+)\s+County
and extract captured group #1 from match result.
Your \w\sCounty, regex returns w County because \w matches a single character that is either a letter, digit, or _. It does not match a whole word.
To match 1 or more symbols, you need to use a + quantifier and to capture the part you need to extract you can rely on capturing groups, (...).
So, you can fix your pattern by mere replacing \w with (\w+) and then, after getting a match, access the Match.Groups[1].Value.
However, if the county name contains a non-word symbol, like a hyphen, \w+ won't match it. A \S+ matching 1 or more non-whitespace symbols might turn out a better option in that case.
See a C# demo:
var m = Regex.Match(s, #"(\S+)\s+County");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
See a regex demo.
You can use this regex to find the word before Country
([\w]*.?\s+).?County
The [\w]* match any characters any times
the .? is if maybe there is a especial character in the sentences like (,.!)
and the \s+ for the banks spaces ( work if there is a double blank space in the sentence)
.? before Country if maybe a special character is placed there
If you want to find more than one word just add {n} after like this ([\w]*.?\s+){3}.?County
I am working on regex expression for a word to be replaced if it is a standalone word and not a part of another word. For example the word "thing". If it is something, the substring "thing" should be ignored there, but if a word "thing" is preceded with a special character such as a dot or a bracket, I want it captured. Also I want the word captured if there is a bracket, dot, or comma (or any other non-alphanumeric character is there) after it.
In the string
Something is a thing, and one more thingy and (thing and more thing
In the sentence above I highlighted the 3 words to be marked for replacement. I used the following Regex
\bthing\b
I tried out the above sentence on regex101.com and using this regex expression only the first word gotten highlighted. I understand that my regex would not capture (thing but I thought it would capture the last word in the sentence so that there would be at least 2 occurrences.
Can someone please help me modify my regex expression to capture all 3 occurences in the sentence above?
You were likely using the javascript regex, which returns after the first match is found. If you add the modifier g in the second box on regex101.com, it will find all matches.
This site is better for C# regex testing: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
If you code this in C# and use the 'Matches' method, it should match multiple times.
Regex regex = new Regex("\\bthing\\b");
foreach (Match match in regex.Matches(
"Something is a thing, and one more thingy and (thing and more thing"))
{
Console.WriteLine(match.Value);
}
Shorthand for alphanum [0-9A-Za-z] is [^\W_]
Using a lookbehind and lookahead you'd get
(?<![^\W_])thing(?![^\W_])
Expanded
(?<! [^\W_] ) # Not alphanum behind
thing # 'thing'
(?! [^\W_] ) # Not alphanum ahedad
Matches the highlighted text
Something is a thing, and one more thingy and (thing and more thing
String is given below from which i want to extract the text.
String:
Hello Mr John and Hello Ms Rita
Regex
Hello(.*?)Rita
I am try to get text between 2 strings which "Hello" and "Rita" I am using the above given regex, but its is giving me
Mr John and Hello Ms
which is wrong. I need only "Ms" Can anyone help me out to write proper regex for this situation?
Use a tempered greedy token:
Hello((?:(?!Hello|Rita).)*)Rita
^^^^^^^^^^^^^^^^^^^
See regex demo here
The (?:(?!Hello|Rita).)* is the tempered greedy token that only matches text that is not Hello or Rita. You may add word boundaries \b if you need to check for whole words.
In order to get a Ms without spaces on both ends, use this regex variation:
Hello\s*((?:(?!Hello|Rita).)*?)\s*Rita
Adding the ? to * will form a lazy quantifier *? that matches as few characters as needed to find a match, and \s* will match zero or more whitespaces.
To get the closest match towards ending word, let a greedy dot in front of the initial word consume.
.*Hello(.*?)Rita
See demo at regex101
Or without whitespace in captured: .*Hello\s*(.*?)\s*Rita
Or with use of two capture groups: .*(Hello\s*(.*?)\s*Rita)
Your (.*?) is picking up too much text because .* matches any string of characters. So it grabs everything from the first "Hello" to "Rita" at the end.
One easy way you could get what you want is with this regular expression:
Hello (\S+) Rita
\S matches any non-whitespace character, so \S+ matches any consecutive string of non-whitespace characters, i.e. a single word.
This would be a bit more robust, allowing for multiple spaces or other whitespace between the words:
Hello\s+(\S+)\s+Rita
Demo
you can use lookahead and lookbehind (?<=Hello).*?(?=Rita)
This regex below captures the -aaaa and -cccc but not the -eee
How can I do that?
keywords = "-aaa bbb -ccc -eee";
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-(.*?) ");
Use a "word boundary" /\b/ instead of a space, which matches the end of the string as well as a word/non-word boundary:
Regex.Matches(keywords, #"-(.*?)\b");
or, depending on what characters may be in the strings, just use "word characters" /\w/ to match the pattern:
Regex.Matches(keywords, #"-(\w+)");
MatchCollection worldColNegEnd = Regex.Matches(keywords, #"-(.*?)\b"
Word boundary is better than space, please give someone else upvotes though, since I brain farted the purpose of it.
Also I don't know why you included a ? in your original so I left it, but I believe it is not necessary, as * matches 0 or more matches.
Use
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-(.+?)\b");
Currently, your regex requires a trailing space behind the capturing group. the strings "aaa" and "ccc" have this, but "eee" does not.
Instead of matching any characters occurring after a dash, try matching nonspace characters:
#"-(\S*?)"
keywords = "-aaa bbb -ccc -eee";
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-\w+");
You haven't specified what exactly you are trying to match here.
But if I understood it right, you want to match any alpha string that starts with -
Use this RegEx: -[a-z]+
I want a Regular Expression for a word.otherword form. I tried \b[a-z]\.[a-z]\b, but it gives me an error at the \. part, saying Unrecognized escape sequence. Any idea what's wrong? I'm working under .NET C#. Thanks!
LE:
john.Smith or JoHn.SmItH or JOHN.SMITH should work.
John Smith or john!Smith or john.Smith.Smith shouldn't work.
Try this :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]\.[a-z]\b");
Probably you were not using #?
Your regex tries to match a.a this means a single character. But since you want it to match complete words you need a quantifier e.g.
\b[a-z]+\.[a-z]+\b
Finally you may want to use the case insensitive match to allow for words with capital letters to be matched too :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]+\.[a-z]+\b", RegexOptions.IgnoreCase);
This will match all words.words with at least one character for each word regardless of capitalization.
This will match all word.otherword only if there is a space behind the first word or it is the start of the string and only if there is a space after the second word or it is the end of the string.
foundMatch = Regex.IsMatch(SubjectString, #"(?<=\s|^)\b[a-z]+\.[a-z]+\b(?=\s|$)", RegexOptions.IgnoreCase);
Try this regex for word.word format:
#"\b([a-z]+)\.\1"
For word.otherword use this:
#"\b[a-z]+\.[a-z]+\b"