I'm trying to write a regex to capture periods in the middle of a text (like.this).
I've written a few exceptions for numbers and quotations etc, but I can't figure out how to get it to allow three periods in the middle of a sentence (like...this).
The following should not be a match:
." .“ not...match 7.30
And the following should be a match:
is.match
Atm my regex looks like this:
(\.[^ 0-9."“])
Hope someone can help me as I'm really stuck.
Kind regards
Edited to make myself more clear
Solution
Ended up using this https://regex101.com/r/NcKJxj/1
(?<=\p{L})(?:\.{1,2}|\.{4,})(?=[^ 0-9."“]\p{L})
Using this pattern (\.[^ 0-9."“]) does not match ." or .“ It means the opposite, matching . followed by any char that is not listed in the character class.
If you don't want to match ... but do want to match 1, 2 or 4+ dots:
(?<=\p{L})(?:\.{1,2}|\.{4,})(?=[^ 0-9."“]\p{L})
The pattern matches:
(?<=\p{L}) Positive lookbehind, assert any letter to the left
(?:\.{1,2}|\.{4,}) Match either 1, 2 or 4+ dots
(?=[^ 0-9."“]\p{L}) Assert any char other than the listed in the character class followed by matching any letter
.NET Regex demo
Related
I'm trying to get a regex in C# to parse an mqtt topic to know which action to perform for each topic type we defined in our system. We have two topics that must be differentiated:
cd/hl/projects/{project_id}/var/{var_name}
cd/hl/projects/{project_id}/var/{var_name}/write
{project_id} can be any character but a line break (\n). Can be empty
{var_name} can be any character but a line break (\n). Can be empty
to match string 2 I use the following and it's working in all cases i tested:
^cd/hl/projects/.*/var/.*/write$
So far so good. But I fail when I try to match 1. without also giving a match on 2. using the following regexs:
^(cd/hl/projects/.*/var/.*)(?!/write)$
what I think should do (but it doesn't):
(cd/hl/projects/.*/var/.*) #match any projects and var_names
(?!/write) #not match if /write appears
The problem is that I can't stop matching strings that have /write in the end like:
cd/hl/projects/4d69439d-8c13-4e83-9ed5-60659d953f9f/var/test_count_one/write
I just want not to match the string above but only with:
cd/hl/projects/{project_id}/var/{var_name}
My questions are: I'm following the right approach? What am I missing? How can I achieve my goal?
Thanks
The subparts should contain no /, not just \n. You should replace .* with [^/\n]* and use
REGEX 1: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)
REGEX 2: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)/write$
See the regex demo 1 and regex demo 2.
The [^/\n]* negated character class matches any 0+ (due to *) chars other than / and \n.
PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.
I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?
You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")
You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.
You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo
I have the following regex:
Regex pattern = new Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}/(.)$");
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,20} //should contain at least 8 characters and maximum of 20
My problem is I also need to check if 3 consecutive characters are identical. Upon searching, I saw this solution:
/(.)\1\1/
However, I can't make it to work if I combined it to my existing regex, still no luck:
Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/(.)\1\1/");
What did I missed here? Thanks!
The problem is that /(.)\1\1/ includes the surrounding / characters which are used to quote literal regular expressions in some languages (like Perl). But even if you don't use the quoting characters, you can't just add it to a regular expression.
At the beginning of your regex, you have to say "What follows cannot contain a character followed by itself and then itself again", like this: (?!.*(.)\1\1). The (?! starts a zero-width negative lookahead assertion. The "zero-width" part means that it does not consume any characters in the input string, and the "negative lookahead assertions" means that it looks forward in the input string to make sure that the given pattern does not appear anywhere.
All told, you want a regex like this:
new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$")
I solved by using trial and error:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");
I have the following string:
test123 test ödo 123teö"st 123 m.1212 123t.est
I only want to match strings as a whole that have either characters, digits and special character mixed together. So the regex should match the following string of the example above:
test123 test ödo 123teö"st 123 m.1212 123t.est
Could someone help me out please?
Update
Sorry for not giving a clear explanation of what I need.
I am using C#.
I need to find words that contain alphanumeric strings (eg abc123, 123abc, a1b2c3, 1abc23 etc). Also I need to find strings that contain any kind of symbols (symbols = anythings else than word characters and digits) (eg abc"123, "abc, ab?dd, 100mm", 345t{asd]dd)
If I find a match, I need to "tokenize" (separate digits, word characters and symbols with whitespace) these strings so abc123 becomes abc 123 or 345t{asd]dd becomes 345 t { asd ] dd etc
Assuming you're using a regex flavor that supports lookaheads and Unicode properties, this should get you started:
(?!(?:\pL+|\pN+|\pP+)(?!\S))\S+
\S+ matches one or more non-whitespace characters, but only after the negative lookahead asserts that those characters are not all letters (\pL), digits (\pN), or punctuation (\pP). The inner negative lookahead--(?!\S)--ensures that the outer one examines all the characters in the word.
Although it might satisfy your requirements, this regex is really just a demonstration of the technique you'll probably want to use. As it is, it can be fooled by "words" with (for example) control characters or dingbats in them.
The answer to the question you’ve actually asked is (?s).+, but perhaps you would care to refine your question.