Using Regex, I need to find a word within a string that starts with specific char. The word must be alphanumeric, but may contain underscore (_) within the word. underscore at the beginning and end of the word is not acceptable.
For example I have the following string.
#word1 Message ## message # message #word2_ message #word#3 #_word4 mesagge #word_5
The result should be:
#word1 #word_5
Thanks.
Use regex pattern
(?:^|(?<=\s))#(?!_)\w+(?<!_)(?:(?=\s)|$)
or
(?:^|(?<=\W))#(?!_)\w+(?<!_)(?:(?=\W)|$)
depends what you need/want to have infront/behind...
For example if #word1 in #word_5 #word1. #word#2 #word*3 should match, considering dot . as separator or end of sentence.
This Regex will do it!
(?<=(^|\s))#([a-zA-Z0-9]{1}\w*[a-zA-Z0-9]|[a-zA-Z0-9]{1})(?=(\s|$))
It also matches single letter
This will work - the bounds (lines 1 and 3) are fairly heavy because \b, the word boundary, won't work here since you don't want to match "#word#3", and the "#" character after "d" triggers a word boundary.
(?<=\s|^)
#(?!_)\w+(?<!_)
(?=\s|$)
Related
Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online
I need to find all the words that have between 15 and 20 characters in a big string. And I want to avoid getting a long words with something else at the end (for ex 1234567890abcdef#asdf.com). I don't want that to be a result, only words. Right now I'm spliting the string using white space as token and for each word I'm applying the following regular expression:
^[a-zA-Z0-9]{15,20}$
Is there any chance to do both things using one regular expression?
I'm using C#.
Good examples to catch:
1234567890abcdeg
qwertyuiopasdfgh
1234567890abcdeg, (catch it but remove ",")
Examples to avoid: 1234567890abcdeg#gmail.com
Don't use start/end anchors (^/$), but word delimiters (\b):
\b[a-zA-Z0-9]{15,20}(?=[\s,]|$)
I used (?=[\s,]|$) instead of the end delimiter to force a space character or a comma or the end of the string. Expand it as needed.
You may want to do likewise for the first \b if you need to, for instance: (?<=\s|^).
Normally, you would use word boundaries (\b) before and after the alphanumerics:
\b[a-zA-Z0-9]{15,20}\b
However, there's a small detail to take into account: uderscores ("_") are also considered a word character. The previous regex won't match the following text:
12345678901234567_
In order to avoid it, you can check if it's preceded and followed by either a \b or a "_", with lookarounds.
Regex:
(?<=\b|_)[a-zA-Z0-9]{15,20}(?=\b|_)
i need a regular expression to match only the word's that match the following conditions. I am using it in my C# program
Can be any case
Should not have any numbers
may contain - and ' characters, but are optional
Should start with a letter
I have tried using the expression ^([a-zA-Z][\'\-]?)+$ but it doesn't work.
Here are list of few words that are acceptable
London (Case insensitive)
Jackson's
non-profit
Here are a list of few words that are not acceptable
12london (contains a number and is not started by a alphabet)
-to (does not start with a alphabet)
to: (contains : character, any special character other that - and ' is not allowed)
^[a-zA-Z][-'a-zA-Z]*$
This matches any word that starts with an alphabetical character, followed by any number of alphabetical characters, - or '.
Note that you don't need to escape the - and ' when it's inside the character [] class, as long as the dash is either the first or last character in the sequence.
Note also that I've removed the round brackets from your example - if you don't want to capture the input, you'll get better performance by leaving them out.
Try this one:
^[A-Za-z]+[A-Za-z'-]*$
First of all, try your regexes against tools such as http://www.regextester.com/
You are testing strings that both start with AND end with your pattern (^ means start of line, $ is the end), thus leaving out all of the words contained between two spaces.
You should use \b or \B.
Instead of looking for [a-zA-Z] you can use character classes such as '\D' (not digit).
Let me know if the above is working in your scenario.
\b\D[^\c][a-zA-Z]+[^\c]
It says: word boundaries with no digits, no control characters, one or more alphabetical lower or uppercase character, with no following control characters.
I am trying to use Regex in C# to look for a list of keywords in a bunch of text. However I want to be very specific about what the "surrounding" text can be for something to count as a keyword.
So for example, the keyword "hello" should be found in (hello), hello., hello< but not in hellothere.
My main problem is that I don't REQUIRE the separators, if the keyword is the first word or the last word it's okay. I guess another way to look at it is that the beginning-of-the-file and the end-of-the-file should be acceptable separators.
I'm new to Regex so I was hoping someone could help me get the pattern right. So far I have:
[ <(.]+?keyword[<(.]+?
where <, (, . are some example separators and keyword is of course the keyword I am looking for.
You could use the word boundary anchor:
\bkeyword\b
which would find your keyword only when not part of a larger word.
You will want to look into the word boundary (\b) to avoid matching keywords that appear as a part of another word (as in your hellothere example).
You can also add matching at beginning of line (^) and end of line ($) to control the position where keywords may appear.
I think you want something like:
(^$|[ <(.])+?keyword($|[<(.]+?)
The ^ and $ chars symbolise the start and end of the input text, respectively. (If you specify the Multiline option, it matches to the start/end of the line rather than text, but you would seem to want the Singleline option.)
I need to find all matches of word which strictly begins with "$" and contains only digits. So I wrote
[$]\d+
which gave me 4 matches for
$10 $10 $20a a$20
so I thought of using word boundaries using \b:
[$]\d+\b
But it again matched
a$20 for me.
I tried
\b[$]\d+\b
but I failed.
I'm looking for saying, ACCEPT ONLY IF THE WORD STARTS WITH $ and is followed by DIGITS. How do I tell IT STARTS WITH $, because I think \b is making it assume word boundaries which means surrounded inside alphanumeric characters.
What is the solution?
Not the best solution but this should work. (It does with your test case)
(?<=\s+|^)\$\d+\b
Have you tried
\B\$\d+\b
You were close, you just need to escape the $:
\B\$\d+\b
See the example matches here: http://regexhero.net/tester/?id=79d0ac3b-dd2c-4872-abb4-6a9780c91fc1
Try with ^\$\d+
where ^ denoted the beginning of a string.