Regex and taking care of possible whitespace

Regex and taking care of possible whitespace - c#

I need help with my regex expression. I am trying to match keyword this but only when it is in parenthesis (this). So far I have:
(\()\bthis\b(\))
But looks like it also matches the parenthesis wrapping the word, while I only need to grab the word itself. Another issue is that it won't work if there are whitespaces inside the parenthesis: ( this )

What about group matching with that kind of REGEXP expression:
\((this)\)
Also if you want to match when there are white spaces (spaces, tabs, etc.):
\(\s*(this)\s*\)
Try it out here : Regex101. All the details about each character I'm using in the regex are detailed on that site.
You can retrieve the this value matched in the group by code. Please, check out the documentation related to the language you're using for that.

Related

how can I use unnamed Regex groups in C# inside my regex?

hey so my current regex is #"(into)(to)add\s[^\s]{1,}\1|\2[^\s]{1,}" I want the input to be something "add word into/to category" the regex in general works fine but just the \1|\2 part, I tried using groups and all sorts of solutions but I just can't seem to figure out how I can make it so that the input can be into or to
Can anyone help me out? (this is in C# and using the Regex class)

If I have understood you correctly, then you don't need back references to (unnamed) Groups, you can use a simple alternation, like this:
#"add \w+ (into|to) \w+"
That will select either into or to in the search string.
Edit:
Let's get a Little more 'advanced', using the optional sign '?':
#"add \w+ (in)?to \w+"
This will match 'in' zero or one time, followed by 'to', so it will match into as well as to, exactly as the original RegEx.
Edit2:
I have a feeling, you want to use a variable inside your RegEx, you can of course do that like this:
string search = "into|to";
RegEx regEx = new ReqEx(#"add \w+ (" + search + ") \w+");

From your given example I think you're looking for a regex like add\s\w+\s(into|to)\s\w+. Your current regex matches only strings starting with "intoto" wich is probably not what you want.

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks

I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.

If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.
I have regular expression as:
Regex myregex = new Regex("testValue=\"(.+?)\"");
What does (.+?) indicate?
The string it matches is "testValue=123e4567" and returns 123e4567 as output.
Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?

This means:
( Begin captured group
. Match any character
+ One or more times
? Non-greedy quantifier
) End captured group
In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).
For your "helpMe" example, you'd want this regex:
<helpMe>(.+?)</helpMe>
Given this string:
<div>Something<helpMe>ABCDE</helpMe></div>
You'd get this match:
ABCDE
The value of the non-greedy quantifier is evident in this variation:
Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>
The greedy capture would look like this:
ABCDE</helpMe><helpMe>FGHIJ
There are some useful interactive tools to play with these variations:
Regex Tester
Regex Pal

Ken Redler has a great answer regarding your first question. For the second question try:
<(helpMe)>(.*?)</\1>
Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).
Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.

What does (.+?) indicate?
It means match any character (.) one or more times (+?)
A simple regex to match your second string would be
<helpMe>([a-z0-9]+)<\/helpMe>
This will match any character of a-z and any digit inside <helpme> and </helpMe>.
The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

regex replace - but with a few exceptions

I have a string containing HTML and I need to replace some words to be links - I do this with the following code;
string lNewHTML = Regex.Replace(lOldHTML, "(\bword1\b|\bword2|word3\b)", "$1", RegexOptions.IgnoreCase);
The code works, but I need to include some exceptions to the replace - e.g. I will not replace anything i an img-, li- and a-tag (including link-text and attributes like href and title) but still allow replacements in p-, td- and div-tags.
Can anyone figure this one out?

Ok, after some time of trying to construct a fitting regex, here my try.. This might need additional work, but should point you in the right direction.
I am matching the words "word1" and "word2", not inside a "tag1" or "tag2" tag. You need to adjust this to your needs, of course. Enable RegexOptions.IgnorePatternWhitespace, if you'd like to keep my formatting.
Unfortunatly, I have come up with a regex you could simply plug into Regex.Replace, since this Regex will match the whole String since the match before, but the word you are concerned with is in the first group. This group contains index and length of the word, so you can easily replace it using String.Substring...
(?:
\G
(?:
(?>
<tag1(?<N>)
|<tag2(?<N>)
|</tag1(?<-N>)
|</tag2(?<-N>)
|.)*?
(?(N)(?!))
)*
)
(word1|word2)

You need to use the Replace overload with the MatchEvaluator parameter so that you examine each match and decide whether to replace or not.

I have two problems, one of them is a regex

I am updating some code that I didn't write and part of it is a regex as follows:
\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]
I understand that .*? does a non-greedy match of everything in the second register.
What does ?:\s* in the first and third registers do?
Update: As requested, language is C# on .NET 3.5

The syntax (?:) is a way of putting parentheses around a subexpression without separately extracting that part of the string.
The author wanted to match the (.*?) part in the middle, and didn't want the spaces at the beginning or the end from getting in the way. Now you can use \1 or $1 (or whatever the appropriate method is in your particular language) to refer to the domain name, instead of the first chunk of spaces at the beginning of the string

?: makes the parentheses non-grouping. In that regex, you'll only pull out one piece of information, $1, which contains the middle (.*?) expression.

What does ?:\s* in the first and third registers do?
It's matching zero or more whitespace characters, without capturing them.
The regex author intends to allow trailing whitespace in the square-bracket-tags, matching all DNS labels following the "www." like so:
[url]www.foo.com[/url] # foo.com
[url ]www.foo.com[/url ] # same
[url ]www.foo.com[/url] # same
[url]www.foo.com[/url ] # same
Note that the regex also matches:
[url]www.[/url] # empty string!
and fails to match
[url]stackoverflow.com[/url] # no match, bummer

You may find this Regular Expressions Cheat Sheet very helpful (hopefully). I spent ages trying to learn Regex with no luck. And once I read this cheat-sheet - I immediately understood what I previously failed to learn.
http://krijnhoetmer.nl/stuff/regex/cheat-sheet/

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex and taking care of possible whitespace - c#

Related

how can I use unnamed Regex groups in C# inside my regex?

RegEx to find non-existence of white space prefix but not include the character in the match?

Regular expression match text between tag

regex replace - but with a few exceptions

I have two problems, one of them is a regex

Categories

Resources