Regex to identify MQTT topics - c#

I'm trying to get a regex in C# to parse an mqtt topic to know which action to perform for each topic type we defined in our system. We have two topics that must be differentiated:
cd/hl/projects/{project_id}/var/{var_name}
cd/hl/projects/{project_id}/var/{var_name}/write
{project_id} can be any character but a line break (\n). Can be empty
{var_name} can be any character but a line break (\n). Can be empty
to match string 2 I use the following and it's working in all cases i tested:
^cd/hl/projects/.*/var/.*/write$
So far so good. But I fail when I try to match 1. without also giving a match on 2. using the following regexs:
^(cd/hl/projects/.*/var/.*)(?!/write)$
what I think should do (but it doesn't):
(cd/hl/projects/.*/var/.*) #match any projects and var_names
(?!/write) #not match if /write appears
The problem is that I can't stop matching strings that have /write in the end like:
cd/hl/projects/4d69439d-8c13-4e83-9ed5-60659d953f9f/var/test_count_one/write
I just want not to match the string above but only with:
cd/hl/projects/{project_id}/var/{var_name}
My questions are: I'm following the right approach? What am I missing? How can I achieve my goal?
Thanks

The subparts should contain no /, not just \n. You should replace .* with [^/\n]* and use
REGEX 1: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)
REGEX 2: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)/write$
See the regex demo 1 and regex demo 2.
The [^/\n]* negated character class matches any 0+ (due to *) chars other than / and \n.

Related

Regex for period around letters except for three periods (ellipsis)

I'm trying to write a regex to capture periods in the middle of a text (like.this).
I've written a few exceptions for numbers and quotations etc, but I can't figure out how to get it to allow three periods in the middle of a sentence (like...this).
The following should not be a match:
." .“ not...match 7.30
And the following should be a match:
is.match
Atm my regex looks like this:
(\.[^ 0-9."“])
Hope someone can help me as I'm really stuck.
Kind regards
Edited to make myself more clear
Solution
Ended up using this https://regex101.com/r/NcKJxj/1
(?<=\p{L})(?:\.{1,2}|\.{4,})(?=[^ 0-9."“]\p{L})
Using this pattern (\.[^ 0-9."“]) does not match ." or .“ It means the opposite, matching . followed by any char that is not listed in the character class.
If you don't want to match ... but do want to match 1, 2 or 4+ dots:
(?<=\p{L})(?:\.{1,2}|\.{4,})(?=[^ 0-9."“]\p{L})
The pattern matches:
(?<=\p{L}) Positive lookbehind, assert any letter to the left
(?:\.{1,2}|\.{4,}) Match either 1, 2 or 4+ dots
(?=[^ 0-9."“]\p{L}) Assert any char other than the listed in the character class followed by matching any letter
.NET Regex demo

Variable-length lookbehind for backslashes

What seemed to be a simple task ended up to not work as expected...
I'm trying to match \$\w+\b, unless it's preceded by an uneven number of backslashes.
Examples (only $result should be in the match):
This $result should be matched
This \$result should not be matched
This \\$result should be matched
This \\\$result should not be matched
etc...
The following pattern works:
(?<!\\)(\\\\)*\$\w+\b
However, even repeats of backslashes are included in the match, which is unwanted, so I'm trying to achieve this purely with a variable-length lookbehind, but nothing I tried so far seems to work.
Any regex virtuoso here can lend a hand?
You may use the following pattern:
(?<!(?:^|[^\\])\\(?:\\\\)*)\$\w+\b
Demo.
Breakdown of the Lookbehind; i.e., not preceded by:
(?:^|[^\\]) - Beginning of string/line or any character other than backslash.
\\ - Then, one backslash character.
(?:\\\\)* Then, any even number of backslash characters (including zero).
Looks like asking the question helped me answer my own question.
The part I don't want to be matched has to be wrapped with a positive lookbehind.
(?<=(?<!\\)(\\\\)*)\$\w+\b
Also works if the $result is at the start of the line.
If anyone has more optimal solutions, shoot!
This regular expression gets the wanted text in the third capture group:
(^| )(\\\\)*(\$\w+\b)
Explanation:
(^| ) Either beginning of line or a space
(\\\\)* An even number of backslash characters, including none
( Start of capture group 3
\$\w+\b The wanted text
) End of capture group 3

How do I select all including sensitive case (regex) in c#?

I have a problem with a regex command,
I have a file with a tons of lines and with a lot of sensitive characters,
this is an Example with all sensitive case 0123456789/*-+.&é"'(-è_çà)=~#{[|`\^#]}²$*ù^%µ£¨¤,;:!?./§<>AZERTYUIOPMLKJHGFDSQWXCVBNazertyuiopmlkjhgfdsqwxcvbn
I tried many regex commands but never get the expected result,
I have to select everything from Example to the end
I tried this command on https://www.regextester.com/ :
\sExample(.*?)+
Image of the result here
And when I tried it in C# the only result I get was : Example
I don't understand why --'
Here's a quick chat about greedy and pessimistic:
Here is test data:
Example word followed by another word and then more
Here are two regex:
Example.*word
Example.*?word
The first is greedy. Regex will match Example then it will take .* which consumes everything all the way to the END of the string and the works backwards spitting a character at a time back out, trying to make the match succeed. It will succeed when Example word followed by another word is matched, the .* having matched word followed by another (and the spaces at either end)
The second is pessimistic; it nibbled forwards along the string one character at a time, trying to match. Regex will match Example then it'll take one more character into the .*? wildcard, then check if it found word - which it did. So pessimistic matching will only find a single space and the full match in pessimistic mode is Example word
Because you say you want the whole string after Example I recommend use of a greedy quantifier so it just immediately takes the whole string that remains and declares a match, rather than nibbling forwards one at a time (slow)
This, then, will match (and capture) everything after Example:
\sExample(.*)
The brackets make a capture group. In c# we can name the group using ?<namehere> at the start of the brackets and then everything that .* matches can be retrieved with:
Regex r = new Regex("\sExample(?<x>.*)");
Match m = r.Match("Exampleblahblah");
Console.WriteLine(m.Groups["x"].Value); //prints: blahblah
Note that if your data contains newlines you should note that . doesn't match a newline, unless you enable RegexOptions.SingleLine when you create the regex

Need a regex to match a single character or word but should not match substrings

I have the following sample text:
I want to replace all instances of ;,:,,,.,and,a,an,the with pipe | symbol.
So the output should be something like:
I tried with the following regex but i am not getting a generic regex which matches for all:
"\/(^|\\W);($|\\W)\/",
"\/(^|\\W):($|\\W)\/",
"\/(^|\\W),($|\\W)\/",
"\/(^|\\W).($|\\W)\/",
"\/(^|\\W)and($|\\W)\/",
"\/(^|\\W)a($|\\W)\/",
"\/(^|\\W)an($|\\W)\/",
"\/(^|\\W)the($|\\W)\/",
"\/(^|\\W)said($|\\W)\/",
Also tried:
(?<=\s)(;.)
(?<=\s)(:.)
(?<=\s)(,.)
(?<=\s)(..)
(?<=\s)(an.)
(?<=\s)(and.)
But does not work, please help. Please note a search for a should match the portion
with a light emitting
but should not match
extraction
. Similar behavior required for others.
Although there are some ambiguous cases, by using below regex you are able to match those characters. Be careful about word boundary and non-word boundary meta-characters:
[;.,:]\B|\b(?:an?d?|the)\b
Live demo

Regex - Get matches of #[SomeText] in a string

I want to get all matches of #[SomeText] pattern in a string.
For example, for this string:
here is #[text1] some text #[text2]
I want #[text1] and #[text2].
I'm using Regex Hero to check my pattern matching online,
and my pattern works fine when there's one expression to match,
For example:
here is #[text1] text
but with more then one, I get both matches with the text in the middle.
This is my regex:
#\[.*\]
I would appreciate assistance in isolating the occurrences.
The problem here is that you are using greedy quantifier (*). To capture all you need, you should use lazy quantifier (*?) with a global modifier:
/(#\[.*?\])/g
Take a look here https://regex101.com/r/pH0gA5/1
This should work :
#\[(.*?)\]
Details :
(.*?) : match everything in a non-greedy way and capture it.
Because the *? quantifier is lazy (non-greedy), it matches as few characters as possible to allow the overall match attempt to succeed, i.e. text1. For the match attempt that starts at a given position, a lazy quantifier gives you the shortest match.
.* is greedy by default, so it only finds one match, treating "text1] and #[text2" as the text between the two square brackets.
If you add a questions mark after the .* then it will find the minimum number of characters before reaching a ].
So the regex \#[.*?] do what you want.

Categories

Resources