Can regex match Interleaved matches? [duplicate] - c#

This question already has answers here:
How to find overlapping matches with a regexp?
(4 answers)
Closed 5 years ago.
I have a pattern with opening tags and closing tags
e.g. /*tag1_START*/ some content /*tag1_END*/ other text /*tag2_START*/ some content /*tag2_END*/
and i use the Regex \/\*([a-zA-Z0-9]+)_START\*\/(.*?)\/\*\1_END\*
can see # regex101
BUT, There was a situation where the tags were interleaved (mistakingly):
e.g. /*tag3_START*/ some /*tag4_START*/ content /*tag3_END*/ other /*tag4_END*/ content
I can easily check the overlap in the matches, but REGEX does not return Both tags because it continue from the last char it matched...
Can i use Regex to find Overlapping matches or i need to write my own code ?

Lookarounds do assert rather than consume characters. However capturing groups still store matched parts in them. Just put overlapping part inside a positive lookahead:
\/\*([a-zA-Z0-9]+)_START\*\/(?=(.*?)\/\*\1_END\*)
Live demo

(?=\*([a-zA-Z0-9]+)_START\*\/(.*?)\/\*(\1)_END\*)
You will have to use lookahead and not capture anything.See demo.
https://regex101.com/r/vsA3ZU/1

Related

Regex.Replace with groups duplicated output? [duplicate]

This question already has answers here:
String.replaceAll(regex) makes the same replacement twice
(2 answers)
Closed 4 years ago.
I have a weird problem with Regex.Replace.
I think my immediate window says it all:
pattern
"([^_]*)(.*)"
fileNameToReplicate
"{Productnr}_LEI1.JPG"
Regex.Replace(fileNameToReplicate, pattern, $"$1")
"{Productnr}"
Regex.Replace(fileNameToReplicate, pattern, $"$2")
"_LEI1.JPG"
Regex.Replace(fileNameToReplicate, pattern, $"sometext$2")
"sometext_LEI1.JPGsometext"
Thus, my pattern looks for the first underscore and captures everything until that underscore in group1.
Then it captures the rest of the text (starting with that underscore until the end of the string) and captures that as group 2.
The regex captures correctly, look here to review it.
Why is the prefixed text outputted twice? Once before the group, and once after the group. Obviously I expected to have this is output:
"sometext_LEI1.JPG"
It does not matter how many X-stars occur in sequence:
(.*)(.*)(.*)(...
since there is a position called end of subject string that all of them will match it. To see your expected result change your pattern to:
^([^_]*)(.*)
Above adds a caret which defines a boundary and makes engine to not start a match right at the end of input string.

What does regex expression match pattern "\\[.*\\]" mean? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I am new to regex. What does regex expression match pattern "\[.*\]" mean?
If I have a text like "Hello [Here]", then success is returned in the match. And match contain [Here].
I read that:
. indicates Any except \n (newline),
* indicates 0 or more times
I don't understand the "\". It believe it is just escape sequence for "\".
So, is the expression "\[.*\]" trying to match a pattern like \[Any text\]?
Yes, you are right. It will match any characters enclosed in []. The .* imply any or no characters enclosed in [].
Also you should try this link which is a very helpful regex tool. You can input the regex pattern and check for matches easily.
I have tried this on regexr, here is a screen shot:

Regex to match XML elements in a text file [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I have a text file consist of conversion instruction templates.
I need to parse this text file,
I need to match something like this:
(Source: <element>)
And get the "element".
Or this pattern:
(Source: <element attr="name" value=""/>)
And get "element attr="name"".
I am currently using this regex:
\(Source:\ ?\<(.*?)\>\)
Sorry for being a newbie. :)
Thanks for all your help.
-JRC
Try this Regex for detect attibs by both ” or " characters:
\(Source:\s+<(\w+\s+(?:\w+=[\"”][^\"”]+[\"”])?)[^>]*>\)
and your code:
var result = Regex.Match(strInput,
"\\(Source:\\s+<(\\w+\\s+(?:\\w+=[\"”][^\"”]+[\"”])?)[^>]*>")
.Groups[1].Value;
explain:
(subexpression)
Captures the matched subexpression and assigns it a zero-based ordinal number.
?
Matches the previous element zero or one time.
\w
Matches any word character.
+
Matches the previous element one or more times.

Filter out alphabetic with regex using C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Only letters?
I try to filter out alphabetics ([a-z],[A-Z]) from text.
I tried "^\w$" but it filters alphanumeric (alpha and numbers).
What is the pattern to filter out alphabetic?
Thanks.
To remove all letters try this:
void Main()
{
var str = "some junk456456%^&%*333";
Console.WriteLine(Regex.Replace(str, "[a-zA-Z]", ""));
}
For filtering out only English alphabets use:
[^a-zA-Z]+
For filtering out alphabets regardless of the language use:
[^\p{L}]+
If you want to reverse the effect remove the hat ^ right after the opening brackets.
If you want to find whole lines that match the pattern then enclose the above patterns within ^ and $ signs, otherwise you don't need them. Note that to make them effect for every line you'll need to create the Regex object with the multi-line option enabled.
try this simple way:
var result = Regex.Replace(inputString, "[^a-zA-Z\s]", "");
explain:
+
Matches the previous element one or more times.
[^character_group]
Negation: Matches any single character that is not in character_group.
\s
Matches any white-space character.
To filter multiple alpha characters use
^[a-zA-Z]+$

How do I write a regex to match a string that doesn't contain a word? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to match string not containing a word?
To not match a set of characters I would use e.g. [^\"\\r\\n]*
Now I want to not match a fixed character set, e.g. "|="
In other words, I want to match: ( not ", not \r, not \n, and not |= ).
EDIT: I am trying to modify the regex for parsing data separated with delimiters. The single-delimiter solution I got form a CSV parser, but now I want to expand it to include multi-character delimiters. I do not think lookaheads will work, because I want to consume, not just assert and discard, the matching characters.
I figured it out, it should be: ((?![\"\\r\\n]|[|][=]).)*
The full regex, modified from the CSV parser link in the original post, will be: ((?<field>((?![\"\\r\\n]|[|][=]).)*)|\"(?<field>([^\"]|\"\")*)\")([|][=]|(?<rowbreak>\\r\\n|\\n|$))
This will match any amount of characters of ( not ", not \r, not \n, and not |= ), or a quoted string, followed by ( "|=" or end of line )

Categories

Resources