Repetitive pattern but the last one is different - Regex c# - c#

I have this pattern:
^([a-zA-Z0-9]+ )+$
It is supposed to match sentences like:
sfjgsjsg_sbskdf_dsjkfshfsh
sdfhs_skjhsijdgh_dsnjbkg_sdkfsbk_nasjksdj_nsdjkfs
I don't know the word size nor how many words will be in each line.
The problem is that upper pattern identify only sentences like:
sfjgsjsg_sbskdf_dsjkfshfsh_
sdfhs_skjhsijdgh_dsnjbkg_sdkfsbk_nasjksdj_nsdjkfs_
Being _->(space)

You can use
^[a-zA-Z0-9]+(?: [a-zA-Z0-9]+)*$
Or, if any whitespace is meant:
^[a-zA-Z0-9]+(?:\s[a-zA-Z0-9]+)*$
If there can be only one occurrence of horizontal spaces:
^[a-zA-Z0-9]+(?:[\p{Zs}\t][a-zA-Z0-9]+)*$
and if there can be more than one:
^[a-zA-Z0-9]+(?:[\p{Zs}\t]+[a-zA-Z0-9]+)*$
Note that leading/trailing whitespace support can be added by placing *, [\p{Zs}\t]or \s* next to the ^ (right after it) and $ (right before it) anchors.
Details:
^ - start of string
[a-zA-Z0-9]+ - one or more ASCII alphanumeric chars
- a space ([\p{Zs}\t] is any whitespace other than line break chars, \s matches any whitespaces)
(?: [a-zA-Z0-9]+)* - zero or more repetitions of a space and one or more ASCII alphanumeric chars
$ - end of string.

Related

Regex: Check if there are more than x line breaks

I need to validate a string according to the occurence of line breaks.
The input is okay if there are no more than say 6 line breaks.
The input it not okay if there are more than say 6 line breaks.
Of course between the line breaks can (but does not have to) occur other characters.
I need to solve this solely within the regular expression because I cannot add any additional code.
I tought about something like this:
/^(\r\n|\r|\n){0,6}$/ // not working :[
You can use
Regex.IsMatch(input, #"^.*(?:\n.*){0,6}\z")
Or, if your line endings can be single CR/LF, you should bear in mind that in a .NET regex, . - without the RegexOptions.Singleline option - matches any chars but LF, and matches CR chars, so you will need to use something like
Regex.IsMatch(input, #"^[^\r\n]*(?:(?:\r\n?|\n)[^\r\n]*){0,6}\z")
The regex matches
^ - start of string
.* - any zero or more chars other than line feed (\n) char as many as possible (= a line)
(?:\n.*){0,6} - zero to six consecutive occurrences of an LF char and then any zero or more chars other than an LF char as many as possible
\z - the very end of string.
The second pattern matches
^ - start of string
[^\r\n]* - zero or more chars other than LF and CR as many as possible
(?:(?:\r\n?|\n)[^\r\n]*){0,6} - zero to six occurrences of
(?:\r\n?|\n) - either CRLF, or CR, or LF
[^\r\n]* - zero or more chars other than LF and CR as many as possible
\z - the very end of string.

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.
You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

Remove Dashes but Not Hyphens

I want to remove dashes before, after, and between spaced words, but not hyphenated words.
This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.
should become:
This is a test-sentence. Test One-Two--Three---Four.
Remove multiple dashes ---.
Keep multiple hyphens Three---Four.
I was trying to do it with this:
http://rextester.com/SXQ57185
string sentence = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
string regex = #"(?<!\w)\-(?!\-)|(?<!\-)\-(?!\w)";
sentence = Regex.Replace(sentence, regex, "");
Console.WriteLine(sentence);
But the output is:
This is a test-sentence. Test - One-TwoThree-Four--.
What I would recommend doing is a combination of both a positive lookback and a positive lookahead against the characters that you don't want the dashes to be next to. In your case, that would be spaces and full stops. If either the lookbehind or lookahead match, you want to remove that dash.
This would be: ((?<=[\s\.])\-+)|(\-+(?=[\s\.])).
Breaking this down:
((?<=[\s\.])\-+) - match hyphens that follow either a space or a full stop
| - or
(\-+(?=[\s\.]) - match hyphens that are followed by either a space or a full stop
Here's a JavaScript example showcasing that:
const string = 'This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.';
const regex = /((?<=[\s\.])\-+)|(\-+(?=[\s\.]))/g;
console.log(string.replace(regex, ''));
And this can also been seen on Regex101.
Note that you'll probably also want to trim the excess spaces after using this, which can simply be done with .Trim() in C#.
You can use \b|\s for this task.
/(\b|\s)(-{3})(\b|\s)/g
DEMO
Breakdown shamelessly copied from regex101.com:
/(\b|\s)(-{3})(\b|\s)/g
1st Capturing Group (\b|\s)
1st Alternative \b
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
2nd Alternative \s
\s matches any whitespace character (equal to [\r\n\t\f\v ])
2nd Capturing Group (-{3})
-{3} matches the character - literally (case sensitive)
{3} Quantifier — Matches exactly 3 times
3rd Capturing Group (\b|\s)
1st Alternative \b
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
2nd Alternative \s
\s matches any whitespace character (equal to [\r\n\t\f\v ])
You may just match all hyphens in between word chars, and remove all others with a simple
Regex.Replace(s, #"\b(-+)\b|-", "$1")
See the regex demo
Details
\b(-+)\b - word boundary, followed with 1+ hyphens, and then again a word boundary (that is, hyphen(s) in between letters, digits and underscores)
| - or
- - a hyphen in other contexts (it will be removed).
See the C# demo:
var s = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
var result = Regex.Replace(s, #"\b(-+)\b|-", "$1");
Console.WriteLine(result);
// => This is a test-sentence. Test One-Two--Three---Four.

Regex: non-zero number followed by one or more spaces followed by non-zero number

Trying to match a user input in the format [1-9], whitespace, [1-9]
So
1 1 should pass
1 0 should fail
new Regex(#"^[1-9]+\s+\d+").IsMatch(input) //works but allows 0 for the 2nd number
new Regex(#"^[1-9]+\s+\[1-9]+").IsMatch(input) //does not work for some reason
I feel like I'm missing something super basic, but I can't find the answer.
Both your regexps do not work as intended. The ^[1-9]+\s+\d+ pattern matches 1+ digits from 1 to 9, then 1+ whitespaces, and then any 1+ digits that can be followed with anything, any number of any chars. The ^[1-9]+\s+\[1-9]+ pattern contains an escaped [ and instead of matching any 1+ digits from 1 to 9 with [1-9] your \[1-9]+ matches a [, then 1-9 substring and then 1+ ] chars.
If you plan to match strings consisting of single non-zero digits separated with 1+ whitespaces, use #"^[1-9]\s+[1-9]$". See this regex demo.
If you plan to match a string that consists of two digit chunks not starting with 0 and separated with 1 or more whitespace chars, use
#"^[1-9][0-9]*\s+[1-9][0-9]*$"
See the regex demo. Note that $ is an end of string anchor that does not allow almost any chars after it (it does allow an \n at the end of the string, so, you might want to use \z instead of $).
Pattern details
^ - start of string
[1-9] - a 1, 2... 9
[0-9]* - zero or more digtis
\s+ - 1+ whitespace chars
[1-9][0-9]* - see above
$ / \z - end of string / the very end of string anchors.

Regular Expression for no repeating special characters (C#)

I am new to regular expressions and need a regular expression for address, in which user cannot enter repeating special characters such as: ..... or ,,,.../// etc and none of the special characters could be entered more than 5 times in the string.
...,,,....// =>No Match
Street no. 40. hello. =>Match
Thanks in advance!
I have tried this:
([a-zA-Z]+|[\s\,\.\/\-]+|[\d]+)|(\(([\da-zA-Z]|[^)^(]+){1,}\))
It selects all alphanumeric n some special character with no empty brackets.
You can use Negative lookahead construction that asserts what is invalid to match. Its format is (?! ... )
For your case you can try something like this:
This will not match the input string if it has 2 or more consecutive dots, commas or slashes (or any combination of them)
(?!.*[.,\/]{2}) ... rest of the regex
This will not match the input string if it has more than 5 characters 'A'.
(?!(.*A.*){5}) ... rest of the regex
This will match everything except your restrictions. Repplace last part (.*) with your regex.
^(?!.*[.,\/]{2})(?!(.*\..*){5})(?!(.*,.*){5})(?!(.*\/.*){5}).*$
Note: This regex may no be optimized. It may be faster if you use loop to iterate over string characters and count their occurences.
You can use this regex:
^(?![^,./-]*([,./-])\1)(?![^,./-]*([,./-])(?:[^,./-]*\2){4})[ \da-z,./-]+$
In C#:
foundMatch = Regex.IsMatch(yourString, #"^(?![^,./-]*([,./-])\1)(?![^,./-]*([,./-])(?:[^,./-]*\2){4})[ \da-z,./-]+$", RegexOptions.IgnoreCase);
Explanation
The ^ anchor asserts that we are at the beginning of the string
The negative lookahead (?![^,./-]*([,./-])\1) asserts that it is not possible to match any number of special chars, followed by one special char (captured to Group 1) followed by the same special char (the \1 backreference)
The negative lookahead (?![^,./-]*([,./-])(?:[^,./-]*\2){4}) ` asserts that it is not possible to match any number of special chars, followed by one special char (captured to Group 2), then any non-special char and that same char from Group 2, four times (five times total)
The $ anchor asserts that we are at the end of the string
A regular expression string to detect invalid strings is:
[^\w \-\r\n]{2}|(?:[\w \-]+[^\w \-\r\n]){5}
As C# string literal (regular and verbatim):
"[^\\w \\-\\r\\n]{2}|(?:[\\w \\-]+[^\\w \\-\\r\\n]){5}"
#"[^\w \-\r\n]{2}|(?:[\w \-]+[^\w \-\r\n]){5}"
It is much easier to find a string than to validate if a string does not contain ...
It can be checked with this expression if the string entered by the user is invalid because of a match of 2 special characters in sequence OR 5 special characters used in the string.
Explanation:
[^...] ... a negative character class definition which matches any character NOT being one of the characters listed within the square brackets.
\w ... a word character which is either a letter, a digit or an underscore.
The next character is simply a space character.
\- ... the hyphen character which must be escaped with a backslash within square brackets as otherwise the hyphen character would be interpreted as "FROM x TO z" (except when being the first or the last character within the square brackets).
\r ... carriage return
\n ... line-feed
Therefore [^\w \-\r\n] finds a character which is NOT a letter, NOT a digit, NOT an underscore, NOT a space, NOT a hyphen, NOT a carriage return and also NOT a line-feed.
{2} ... the preceding expression must match 2 such characters.
So with the expression [^\w \-\r\n]{2} it can be checked if the string contains 2 special characters in a sequence which makes the string invalid.
| ... OR
(?:...) ... none marking group needed here for applying the expression inside with the multiplier {5} at least 5 times.
[...] ... a positive character class definition which matches any character being one of the characters listed within the square brackets.
[\w \-]+ ... find a word character, or a space, or a hyphen 1 or more times.
[^\w \-\r\n] ... and next character being NOT a word character, space, hyphen, carriage return or line-feed.
Therefore (?:[\w \-]+[^\w \-\r\n]){5} finds a string with 5 "special" characters between "standard" characters.

Categories

Resources