A regular expression for matching a simple word in C#?

A regular expression for matching a simple word in C#? - c#

i need a regular expression to match only the word's that match the following conditions. I am using it in my C# program
Can be any case
Should not have any numbers
may contain - and ' characters, but are optional
Should start with a letter
I have tried using the expression ^([a-zA-Z][\'\-]?)+$ but it doesn't work.
Here are list of few words that are acceptable
London (Case insensitive)
Jackson's
non-profit
Here are a list of few words that are not acceptable
12london (contains a number and is not started by a alphabet)
-to (does not start with a alphabet)
to: (contains : character, any special character other that - and ' is not allowed)

^[a-zA-Z][-'a-zA-Z]*$
This matches any word that starts with an alphabetical character, followed by any number of alphabetical characters, - or '.
Note that you don't need to escape the - and ' when it's inside the character [] class, as long as the dash is either the first or last character in the sequence.
Note also that I've removed the round brackets from your example - if you don't want to capture the input, you'll get better performance by leaving them out.

Try this one:
^[A-Za-z]+[A-Za-z'-]*$

First of all, try your regexes against tools such as http://www.regextester.com/
You are testing strings that both start with AND end with your pattern (^ means start of line, $ is the end), thus leaving out all of the words contained between two spaces.
You should use \b or \B.
Instead of looking for [a-zA-Z] you can use character classes such as '\D' (not digit).
Let me know if the above is working in your scenario.
\b\D[^\c][a-zA-Z]+[^\c]
It says: word boundaries with no digits, no control characters, one or more alphabetical lower or uppercase character, with no following control characters.

Related

How to match string by using regular expression which will not allow same special character at same time?

I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)

You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1

The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo

Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.

Regex for first name

I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.

Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$

This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.

Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo

Regex to match more than one word

I have an ASP.NET MVC application containing a form field called 'First/last name'. I need to add some basic validation to ensure people enter at least two words. It doesn't need to be totally comprehensive in checking word length etc, we essentially just need to prevent people from entering just their first name which is what's happening currently. I don't want to limit to just alphabetic characters as some names include punctuation. I just want to ensure that people have entered at least two words separated by a space.
I have the following regex currently:
[RegularExpression(#"^((\b[a-zA-Z]{2,40}\b)\s*){2,}$", ErrorMessage = "Invalid first/last name")]
This works to an extent (it checks for 2 words) but it's invalid if punctuation is entered, which isn't what I'm looking for.
Could anyone suggest how to modify the above so that it doesn't matter if punctuation is used in the words? I'm not good with the regular expression syntax, hence asking here.
Thanks.

You want two words, so at least one space between them, and beyond that you want to allow everything else (e.g., punctuation). So keep it simple:
\w.*\s.*\w
Or if you must anchor it to start and end:
^.*\w.*\s.*\w.*$
These will match, for example, D' Addario (but not D'Artagnan by itself, since it counts as one word by the space criterion).

Maybe just:
#"\w\s\w"
word white space word

Hi you can use this regex for validation
'^[a-zA-Z0-9]+ {1}[a-zA-Z0-9]+$`'
Demo http://rubular.com/r/YN8eFa1yFE

If you just want to allow a sequence of non-whitespace characters followed by 1 or more sequences of whitespace characters followed by non-whitespace characters, you can use
^\s*\S+(?:\s+\S+)+\s*$
See regex demo
It won't accept just First or First .
Regex breakdown:
^ - start of string
\s* - zero or more whitespace
\S+ - 1 or more non-whitespace symbols
(?:\s+\S+)+ - 1 or more sequences of ...
\s+ - 1 or more whitespace sequences (remove + to allow only 1 whitespace between words)
\S+ - 1 or more non-whitespace symbols
\s* - zero or more whitespace
$ - end of string

One single regular expression to match multiple alphanumeric words from 15 to 20 characters

I need to find all the words that have between 15 and 20 characters in a big string. And I want to avoid getting a long words with something else at the end (for ex 1234567890abcdef#asdf.com). I don't want that to be a result, only words. Right now I'm spliting the string using white space as token and for each word I'm applying the following regular expression:
^[a-zA-Z0-9]{15,20}$
Is there any chance to do both things using one regular expression?
I'm using C#.
Good examples to catch:
1234567890abcdeg
qwertyuiopasdfgh
1234567890abcdeg, (catch it but remove ",")
Examples to avoid: 1234567890abcdeg#gmail.com

Don't use start/end anchors (^/$), but word delimiters (\b):
\b[a-zA-Z0-9]{15,20}(?=[\s,]|$)
I used (?=[\s,]|$) instead of the end delimiter to force a space character or a comma or the end of the string. Expand it as needed.
You may want to do likewise for the first \b if you need to, for instance: (?<=\s|^).

Normally, you would use word boundaries (\b) before and after the alphanumerics:
\b[a-zA-Z0-9]{15,20}\b
However, there's a small detail to take into account: uderscores ("_") are also considered a word character. The previous regex won't match the following text:
12345678901234567_
In order to avoid it, you can check if it's preceded and followed by either a \b or a "_", with lookarounds.
Regex:
(?<=\b|_)[a-zA-Z0-9]{15,20}(?=\b|_)

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?

You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")

You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.

You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

A regular expression for matching a simple word in C#? - c#

Try this one: ^[A-Za-z]+[A-Za-z'-]*$

Related

How to match string by using regular expression which will not allow same special character at same time?

Regex for first name

Regex to match more than one word

One single regular expression to match multiple alphanumeric words from 15 to 20 characters

RegEx : Find match based on 1st two chars

Categories

Resources