Regex length validation: Ignore leading and trailing whitespaces

Regex length validation: Ignore leading and trailing whitespaces - c#

I'm trying to limit our users between 5 and 1024 characters per post.
We are currently using asp.net RegularExpressionValidator for this.
This is kinda working if I set the expression to the following:
body.ValidationExpression = string.Format("^[\\s\\S]{{{0},{1}}}$",
MinimumBodyLength,
MaximumBodyLength);
However, this does not take NewLines and leading/tailing/multiple spaces into account. So users can type stuff like:
Ok (three spaces) . (dot), and it will still be validated because three spaces count as characters. The same goes for newlines. We can also type a single . (dot) followed by 5 spaces/newlines.
I have tried multiple regex variations I've found around the web, but none seem to fit my needs exactly. I want them to type minimum 5 characters, and maximum 3000 characters, but I don't really care how many newLines and spaces they use.
To clearify, I want people to be able to type:
Hi,
My name is ben
I do not want them to be able to type:
Hi .
or
A B
(lots of newlines or spaces)
It is possible that regex might not be the way to go? If so, how can I search and replace on the string before the regex evaluates (while still catch it in the validator with the old expression)?

Use the regex below:
body.ValidationExpression = string.Format("^((\S)|((\s+|\n+|(\r\n)+)+\S)|(\S(\s+|\n+|(\r\n)+))+){{{0},{1}}}$",
MinimumBodyLength,
MaximumBodyLength);
It treats as single entity either a single character or single character after (or before) any number of whitespace characters.

If I understood you problem, you want to count only word characters. If that's the point, you could try this:
body.ValidationExpression = string.Format("^\w{{{0},{1}}}$",
MinimumBodyLength,
MaximumBodyLength);

Related

Regex groups expression not capturing content

I'm trying to create a large regex expression where the plan is to capture 6 groups.
Is gonna be used to parse some Android log that have the following format:
2020-03-10T14:09:13.3250000 VERB CallingClass 17503 20870 Whatever content: this log line had (etc)
The expression I've created so far is the following:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w{+})\t(\d{5})\t(\d{5})\t(.*$)
The lines in this case are Tab separated, although the application that I'm developing will be dynamic to the point where this is not always the case, so regex I feel is still the best option even if heavier then performing a split.
Breaking down the groups in more detail from my though process:
Matches the date (I'm considering changing this to a x number of characters instead)
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})
Match a block of 4 characters
([A-Za-z]{4})
Match any number of characters until the next tab
(\w{+})
Match a block of 5 numbers 2 times
\t(\d{5})
At last, match everything else until the end of the line.
\t(.*$)
If I use a reduced expression to the following it works:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(.*$)
This doesn't include 3 of the groups, the word and the 2 numbers blocks.
Any idea why is this?
Thank you.

The problem is \w{+} is going to match a word character followed by one or more { characters and then a final } character. If you want one or more word characters then just use plus without the curly braces (which are meant for specifying a specific number or number range, but will match literal curly braces if they do not adhere to that format).
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w+)\t(\d{5})\t(\d{5})\t(.*$)
I highly recommend using https://regex101.com/ for the explanation to see if your expression matches up with what you want spelled out in words. However for testing for use in C# you should use something else like http://regexstorm.net/tester

Validate first 2 characters of a textfield follwed by 4 int

I need validation in a text box that when I enter some value in the text box, the first two characters should be characters only, then it allows only followed by 4 ints.
i wrote code like below:
if (Regex.IsMatch(Txtvalue.Text, "^[A-Za-z]{2}d{4}.+$"))
It shows error only. not validate properly.

You might want to use an editor (e.g. regex101 ) for regexes to debug them while writing.
One problem I see on your regex is that you are matching d literally.
I suppose you meant to write \d to match digits.
So ^[A-Za-z]{2}\d{4}.+$ should work.
Another thing I am suspecting is that you don't want the + quantifier as this will prevent ab1234 from being matched when it is not followed at least by a single character. To solve this use the * quantifier instead.

Simple phone number regex to match numbers, spaces, etc

I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,

[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)

The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.

match any a-z/A-Z and - character after certain regular expression

i need a certain string to be in this format:
[0000] anyword
so between the [] brackets i need 4 numbers, followed by a whitespace. after that only characters ranging from a to z and - characters are allowed.
so this should be allowed:
[0000] foo-bar
[0000] foo
[0000] foo-bar-foo
etc..
so far i have this:
\[[0-9]{4}\]\s
this matches the [0000] , so it maches the brackets with 4 numbers in it and the whitespace.
i can't seem to find something that allows charachters after that. i've tried putting a single "." at the end of the expression as this should match any character but this doesnt seem to be working.
\[[0-9]{4}\]\s^[A-Z]+[a-zA-Z]*$
the above isn't working either..
i need this expression as a Validationexpression for an asp.net custom validator.
any help will be appreciated

(\[[0-9]{4}\])\s+([A-z\-]+) should hopefully work. It'll capture the numbers and letters into two capture groups as well.

This works for your input: http://regexr.com/?30sb7. Unlike Cornstalk's answer it does not capture anything, and - can indeed be placed later in a range if it's escaped.

Try this one
#"\[[0-9]{4}\] [a-zA-Z]+(-[a-zA-Z]+)*"

I need a regular expression to convert US tel number to link

Basically, the input field is just a string. People input their phone number in various formats. I need a regular expression to find and convert those numbers into links.
Input examples:
(201) 555-1212
(201)555-1212
201-555-1212
555-1212
Here's what I want:
(201) 555-1212 - Notice the space is gone
(201)555-1212
201-555-1212
555-1212
I know it should be more robust than just removing spaces, but it is for an internal web site that my employees will be accessing from their iPhone. So, I'm willing to "just get it working."
Here's what I have so far in C# (which should show you how little I know about regular expressions):
strchk = Regex.Replace(strchk, #"\b([\d{3}\-\d{4}|\d{3}\-\d{3}\-\d{4}|\(\d{3}\)\d{3}\-\d{4}])\b", "<a href='tel:$&'>$&</a>", RegexOptions.IgnoreCase);
Can anyone help me by fixing this or suggesting a better way to do this?
EDIT:
Thanks everyone. Here's what I've got so far:
strchk = Regex.Replace(strchk, #"\b(\d{3}[-\.\s]\d{3}[-\.\s]\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]\d{4}|\d{3}[-\.\s]\d{4})\b", "<a href='tel:$1'>$1</a>", RegexOptions.IgnoreCase);
It is picking up just about everything EXCEPT those with (nnn) area codes, with or without spaces between it and the 7 digit number. It does pick up the 7 digit number and link it that way. However, if the area code is specified it doesn't get matched. Any idea what I'm doing wrong?
Second Edit:
Got it working now. All I did was remove the \b from the start of the string.

Remove the [] and add \s* (zero or more whitespace characters) around each \-.
Also, you don't need to escape the -. (You can take out the \ from \-)
Explanation: [abcA-Z] is a character group, which matches a, b, c, or any character between A and Z.
It's not what you're trying to do.
Edits
In response to your updated regex:
Change [-\.\s] to [-\.\s]+ to match one or more of any of those characters (eg, a - with spaces around it)
The problem is that \b doesn't match the boundary between a space and a (.

Afaik, no phone enters the other characters, so why not replace [^0-9] with '' ?

Here's a regex I wrote for finding phone numbers:
(\+?\d[-\.\s]?)?(\(\d{3}\)\s?|\d{3}[-\.\s]?)\d{3}[-\.\s]?\d{4}
It's pretty flexible... allows a variety of formats.
Then, instead of killing yourself trying to replace it w/out spaces using a bunch of back references, instead pass the match to a function and just strip the spaces as you wanted.
C#/.net should have a method that allows a function as the replace argument...
Edit: They call it a `MatchEvaluator. That example uses a delegate, but I'm pretty sure you could use the slightly less verbose
(m) => m.Value.Replace(' ', '')
or something. working from memory here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex length validation: Ignore leading and trailing whitespaces - c#

Use the regex below: body.ValidationExpression = string.Format("^((\S)|((\s+|\n+|(\r\n)+)+\S)|(\S(\s+|\n+|(\r\n)+))+){{{0},{1}}}$", MinimumBodyLength, MaximumBodyLength); It treats as single entity either a single character or single character after (or before) any number of whitespace characters.

If I understood you problem, you want to count only word characters. If that's the point, you could try this: body.ValidationExpression = string.Format("^\w{{{0},{1}}}$", MinimumBodyLength, MaximumBodyLength);

Related

Regex groups expression not capturing content

Validate first 2 characters of a textfield follwed by 4 int

Simple phone number regex to match numbers, spaces, etc

match any a-z/A-Z and - character after certain regular expression

I need a regular expression to convert US tel number to link

Categories

Resources