How to match exactly one or more characters inside boundary - c#

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?

Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}

In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.

it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here

Try this pattern:
[a-zA-Z]{1,}
You can test it online

Related

.Net Regular Expression matching the string "C#"

I am trying to build a regular expression (using .Net's RegEx object) to match the sequence of characters "C#" with word boundaries.
So searching inside the string "I am a C# developer, but I am not a C#developper", I am trying to match the first "C#" (as a word) but not the second "C#" that is a part of a word.
I have tried the pattern "\bC#\b", with no matches.
I have also tried the pattern "\bC#\b" (trying to escape the #), no matches.
I have read somewhere that the pound (#) sign can be interpreted as word boundary. Is this true? And if so, how can we look for that string ("C#") as a word?
The \b does not match between the pound sign and a space because they both match non word characters but is does match between the pound sign and the d char.
Instead of a second word boundary \b, you could assert that what is on the right is not a non-whitspace \S character using a negative lookahead (?!:
\bC#(?!\S)
Regex demo
As pointed out in the comments by #elgonzo, to prevent breaking the match when a non word char follows C#, you could use a positive lookahead to assert what is on the right is either a non word char \W or assert the end of the string $
\bC#(?=\W|$)
Regex demo
The following RegEx matches first word/last word/new line/spaces case insensitive
/(:?^|\s)C#(:?$|\s)/i

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

How To get text between 2 strings?

String is given below from which i want to extract the text.
String:
Hello Mr John and Hello Ms Rita
Regex
Hello(.*?)Rita
I am try to get text between 2 strings which "Hello" and "Rita" I am using the above given regex, but its is giving me
Mr John and Hello Ms
which is wrong. I need only "Ms" Can anyone help me out to write proper regex for this situation?
Use a tempered greedy token:
Hello((?:(?!Hello|Rita).)*)Rita
^^^^^^^^^^^^^^^^^^^
See regex demo here
The (?:(?!Hello|Rita).)* is the tempered greedy token that only matches text that is not Hello or Rita. You may add word boundaries \b if you need to check for whole words.
In order to get a Ms without spaces on both ends, use this regex variation:
Hello\s*((?:(?!Hello|Rita).)*?)\s*Rita
Adding the ? to * will form a lazy quantifier *? that matches as few characters as needed to find a match, and \s* will match zero or more whitespaces.
To get the closest match towards ending word, let a greedy dot in front of the initial word consume.
.*Hello(.*?)Rita
See demo at regex101
Or without whitespace in captured: .*Hello\s*(.*?)\s*Rita
Or with use of two capture groups: .*(Hello\s*(.*?)\s*Rita)
Your (.*?) is picking up too much text because .* matches any string of characters. So it grabs everything from the first "Hello" to "Rita" at the end.
One easy way you could get what you want is with this regular expression:
Hello (\S+) Rita
\S matches any non-whitespace character, so \S+ matches any consecutive string of non-whitespace characters, i.e. a single word.
This would be a bit more robust, allowing for multiple spaces or other whitespace between the words:
Hello\s+(\S+)\s+Rita
Demo
you can use lookahead and lookbehind (?<=Hello).*?(?=Rita)

Regular expression to match exactly the start of a string

I'm trying to build a regular expression in c# to check whether a string follow a specific format.
The format i want is: [digit][white space][dot][letters]
For example:
123 .abc follow the format
12345 .def follow the format
123 abc does not follow the format
I write this expression but it not works completelly well
Regex.IsMatch(exampleString, #"^\d+ .")
^ matches the start of the string, and you got it right.
\d+ matches one or more digits, and you got that one right as well.
A space in a regex matches a literal space, so that works too!
However, a . is a wildcard and will match any one character. You will need to escape it with a backslash like this if you want to match a literal period: \..
To match letters now, you can use [a-z]+ right after the period.
#"^\d+ \.[a-z]+"
The dot is a special character in regex, which matches any character (except, typically, newlines). To match a literal ., you need to escape it:
Regex.IsMatch(exampleString, #"^\d+ \.")
If you want to include the condition for the succeeding letters, use:
Regex.IsMatch(exampleString, #"^\d+ \.[A-Za-z]+$")
For you to get yours to match, keep in mind that the period in regular expressions is a special character that will match any character, so you'll need to escape that.
In addition, \s is a match for any white-space character (tabs, line breaks).
^\d+\s+ \..+
(untested)

Regex to find specific word starting with a specific char

Using Regex, I need to find a word within a string that starts with specific char. The word must be alphanumeric, but may contain underscore (_) within the word. underscore at the beginning and end of the word is not acceptable.
For example I have the following string.
#word1 Message ## message # message #word2_ message #word#3 #_word4 mesagge #word_5
The result should be:
#word1 #word_5
Thanks.
Use regex pattern
(?:^|(?<=\s))#(?!_)\w+(?<!_)(?:(?=\s)|$)
or
(?:^|(?<=\W))#(?!_)\w+(?<!_)(?:(?=\W)|$)
depends what you need/want to have infront/behind...
For example if #word1 in #word_5 #word1. #word#2 #word*3 should match, considering dot . as separator or end of sentence.
This Regex will do it!
(?<=(^|\s))#([a-zA-Z0-9]{1}\w*[a-zA-Z0-9]|[a-zA-Z0-9]{1})(?=(\s|$))
It also matches single letter
This will work - the bounds (lines 1 and 3) are fairly heavy because \b, the word boundary, won't work here since you don't want to match "#word#3", and the "#" character after "d" triggers a word boundary.
(?<=\s|^)
#(?!_)\w+(?<!_)
(?=\s|$)

Categories

Resources