Regex match matches one too many characters - c#

I have a need to perform a somewhat strange regular expression replacement. I've just about got it worked out, but not quite.
I need to remove multiple substrings from a string where the substrings to remove are surrounded by square braces [] except where the square braces are followed by two hashtags []##.
For instance, if the original string is:
[phase]This is []a test [I]## of the emergency broadcast system. [28]##[test]xyz
Then the expected output after the regex replace would be:
This is a test [I]## of the emergency broadcast system. [28]##xyz
So far, I've tried a few things, but the closest regex pattern string I've come up with is "\[[^\]]*\][^##]". The problem with this is that it matches one more character than it should. For instance, using the test string above, and doing the regex replace with an empty string, it returns:
his is test [I]## of the emergency broadcast system. [28]##yz
What is the regex pattern string for which I'm searching?

Your problem is that your additional part,
[^##]
will match the next character that is not a # character. You need a negative lookahead:
\[[^\]]*\](?!##)

Replace your "\[[^\]]*\][^##]" with "(\[[^\]]*\])(([^#]{2})?)".
First, your [^##] does not follow the rule "except where the square braces are followed by two hashtags []##". So it has to be changed to two non-sharpsigh chars.
Second, try replacing like this:
var s= Regex.Replace(input, pattern, "$2");

Related

How to match exactly one or more characters inside boundary

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks
I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.
If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Regex to identify MQTT topics

I'm trying to get a regex in C# to parse an mqtt topic to know which action to perform for each topic type we defined in our system. We have two topics that must be differentiated:
cd/hl/projects/{project_id}/var/{var_name}
cd/hl/projects/{project_id}/var/{var_name}/write
{project_id} can be any character but a line break (\n). Can be empty
{var_name} can be any character but a line break (\n). Can be empty
to match string 2 I use the following and it's working in all cases i tested:
^cd/hl/projects/.*/var/.*/write$
So far so good. But I fail when I try to match 1. without also giving a match on 2. using the following regexs:
^(cd/hl/projects/.*/var/.*)(?!/write)$
what I think should do (but it doesn't):
(cd/hl/projects/.*/var/.*) #match any projects and var_names
(?!/write) #not match if /write appears
The problem is that I can't stop matching strings that have /write in the end like:
cd/hl/projects/4d69439d-8c13-4e83-9ed5-60659d953f9f/var/test_count_one/write
I just want not to match the string above but only with:
cd/hl/projects/{project_id}/var/{var_name}
My questions are: I'm following the right approach? What am I missing? How can I achieve my goal?
Thanks
The subparts should contain no /, not just \n. You should replace .* with [^/\n]* and use
REGEX 1: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)
REGEX 2: ^cd/hl/projects/([^/\n]*)/var/([^/\n]*)/write$
See the regex demo 1 and regex demo 2.
The [^/\n]* negated character class matches any 0+ (due to *) chars other than / and \n.

Extract only numbers from text

I'm trying to extract only numbers from a string/text. Below is the regex pattern I'm using.
Regex regex = new Regex(#"[\d+]\S+");
string extract_from = " 12 abcd 1-2-3a a123z 1.2.3.4 xyz";
From the string "extract_from" above, the regex is extracting the numbers
12
1-2-3a
123z
1.2.3.4
The regex is extracting it correctly except the second and third one "1-2-3a", "123z", which shouldn't be extracted because it contains an alphabet. What pattern can I add in regex to not extract where the numbers also have an alphabet in between?
dash and dot are ok, just not alphabets.
Here, change the regex \S to be \s, notice the caps.
\S matches all but space, \s matches space.
Regex regex = new Regex(#"[\d+]\s+");
Try this one:
[0-9\-.]+\s+
That will allow expressions with more than one decimal, and dashes inside them, vs just at the beginning.
You can use regexhero.net or www.regexplanet.com to test your regex expressions, they're very powerful tools.
Output from your given input would be the following matches:
12
1.2.3.4
Edit, based on comment from OP
This regex shouldn't require a space at the beginning. If you need to match a number at the end of the line, it's probably simplest to just add a special case for it:
[0-9\-.]+\s|[0-9\-.]+$
use this pattern to catch anything but alphabets
(?!\S*[a-zA-Z])\b([^a-zA-Z\s]+)\b
Demo

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?
You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")
You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.
You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Categories

Resources