.Net Regular Expression matching the string "C#" - c#

I am trying to build a regular expression (using .Net's RegEx object) to match the sequence of characters "C#" with word boundaries.
So searching inside the string "I am a C# developer, but I am not a C#developper", I am trying to match the first "C#" (as a word) but not the second "C#" that is a part of a word.
I have tried the pattern "\bC#\b", with no matches.
I have also tried the pattern "\bC#\b" (trying to escape the #), no matches.
I have read somewhere that the pound (#) sign can be interpreted as word boundary. Is this true? And if so, how can we look for that string ("C#") as a word?

The \b does not match between the pound sign and a space because they both match non word characters but is does match between the pound sign and the d char.
Instead of a second word boundary \b, you could assert that what is on the right is not a non-whitspace \S character using a negative lookahead (?!:
\bC#(?!\S)
Regex demo
As pointed out in the comments by #elgonzo, to prevent breaking the match when a non word char follows C#, you could use a positive lookahead to assert what is on the right is either a non word char \W or assert the end of the string $
\bC#(?=\W|$)
Regex demo

The following RegEx matches first word/last word/new line/spaces case insensitive
/(:?^|\s)C#(:?$|\s)/i

Related

How to match exactly one or more characters inside boundary

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online

Word replacement using regex if the target word is not a part of another word

I am working on regex expression for a word to be replaced if it is a standalone word and not a part of another word. For example the word "thing". If it is something, the substring "thing" should be ignored there, but if a word "thing" is preceded with a special character such as a dot or a bracket, I want it captured. Also I want the word captured if there is a bracket, dot, or comma (or any other non-alphanumeric character is there) after it.
In the string
Something is a thing, and one more thingy and (thing and more thing
In the sentence above I highlighted the 3 words to be marked for replacement. I used the following Regex
\bthing\b
I tried out the above sentence on regex101.com and using this regex expression only the first word gotten highlighted. I understand that my regex would not capture (thing but I thought it would capture the last word in the sentence so that there would be at least 2 occurrences.
Can someone please help me modify my regex expression to capture all 3 occurences in the sentence above?
You were likely using the javascript regex, which returns after the first match is found. If you add the modifier g in the second box on regex101.com, it will find all matches.
This site is better for C# regex testing: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
If you code this in C# and use the 'Matches' method, it should match multiple times.
Regex regex = new Regex("\\bthing\\b");
foreach (Match match in regex.Matches(
"Something is a thing, and one more thingy and (thing and more thing"))
{
Console.WriteLine(match.Value);
}
Shorthand for alphanum [0-9A-Za-z] is [^\W_]
Using a lookbehind and lookahead you'd get
(?<![^\W_])thing(?![^\W_])
Expanded
(?<! [^\W_] ) # Not alphanum behind
thing # 'thing'
(?! [^\W_] ) # Not alphanum ahedad
Matches the highlighted text
Something is a thing, and one more thingy and (thing and more thing

How To get text between 2 strings?

String is given below from which i want to extract the text.
String:
Hello Mr John and Hello Ms Rita
Regex
Hello(.*?)Rita
I am try to get text between 2 strings which "Hello" and "Rita" I am using the above given regex, but its is giving me
Mr John and Hello Ms
which is wrong. I need only "Ms" Can anyone help me out to write proper regex for this situation?
Use a tempered greedy token:
Hello((?:(?!Hello|Rita).)*)Rita
^^^^^^^^^^^^^^^^^^^
See regex demo here
The (?:(?!Hello|Rita).)* is the tempered greedy token that only matches text that is not Hello or Rita. You may add word boundaries \b if you need to check for whole words.
In order to get a Ms without spaces on both ends, use this regex variation:
Hello\s*((?:(?!Hello|Rita).)*?)\s*Rita
Adding the ? to * will form a lazy quantifier *? that matches as few characters as needed to find a match, and \s* will match zero or more whitespaces.
To get the closest match towards ending word, let a greedy dot in front of the initial word consume.
.*Hello(.*?)Rita
See demo at regex101
Or without whitespace in captured: .*Hello\s*(.*?)\s*Rita
Or with use of two capture groups: .*(Hello\s*(.*?)\s*Rita)
Your (.*?) is picking up too much text because .* matches any string of characters. So it grabs everything from the first "Hello" to "Rita" at the end.
One easy way you could get what you want is with this regular expression:
Hello (\S+) Rita
\S matches any non-whitespace character, so \S+ matches any consecutive string of non-whitespace characters, i.e. a single word.
This would be a bit more robust, allowing for multiple spaces or other whitespace between the words:
Hello\s+(\S+)\s+Rita
Demo
you can use lookahead and lookbehind (?<=Hello).*?(?=Rita)

Why does this Regex match?

I have written a regular expression to match the following criteria
any digit (0-9)
hyphen
whitespace
in any order
length between 10 and 25
([0-9\-\w]{10,25})
I am using it to detect payment card numbers, so this works:
Regex.IsMatch("34343434343434", "([0-9\\-\\w]{10,25})"); // true
But this also works:
Regex.IsMatch("LogMethodComplete", "([0-9\\-\\w]{10,25})"); // true
What am I doing wrong?
This is C#
Take a look at Regular Expression Language - Quick Reference, section Character Classes.
\w matches any word character including underscore, not whitespace.
To match whitespace, you can use \s.
To match digits, you can use \d.
Instead of using \w you can use \d which means digit you could use regex like
"[\d\-\s]{10,25}" to match your criteria
You don't need to check for "words" and this is what \w does

C# Regular Expression to match letters, numbers and underscore

I am trying to create a regular expression pattern in C#. The pattern can only allow for:
letters
numbers
underscores
So far I am having little luck (i'm not good at RegEx). Here is what I have tried thus far:
// Create the regular expression
string pattern = #"\w+_";
Regex regex = new Regex(pattern);
// Compare a string against the regular expression
return regex.IsMatch(stringToTest);
EDIT :
#"^[a-zA-Z0-9\_]+$"
or
#"^\w+$"
#"^\w+$"
\w matches any "word character", defined as digits, letters, and underscores. It's Unicode-aware so it'll match letters with umlauts and such (better than trying to roll your own character class like [A-Za-z0-9_] which would only match English letters).
The ^ at the beginning means "match the beginning of the string here", and the $ at the end means "match the end of the string here". Without those, e.g. if you just had #"\w+", then "##Foo##" would match, because it contains one or more word characters. With the ^ and $, then "##Foo##" would not match (which sounds like what you're looking for), because you don't have beginning-of-string followed by one-or-more-word-characters followed by end-of-string.
Try experimenting with something like http://www.weitz.de/regex-coach/ which lets you develop regex interactively.
It's designed for Perl, but helped me understand how a regex works in practice.
Regex
packedasciiRegex = new Regex(#"^[!#$%&'()*+,-./:;?#[\]^_]*$");

Categories

Resources