Regex matching wrong pattern - c#

I'm trying to pull text out of a word document using regex look ahead and look behind foudn in this answer:
Regular Expression to find a string included between two characters while EXCLUDING the delimiters
The delimeters I have to work with are
Start: RQ
End: END-RQ
I have added the following (powershell) code:
$regex = [regex] '(?<=RQ)(.*?)(?=END-RQ)'
$matches = $regex.Matches($concat)
The problem is the matching is grabbing the RQ from END-RQ as the beginning of the next pattern. Can anyone tell me how to eliminate that (e.g. force the regex to match exactly RQ and END-RQ)? Wrapping the matching patterns in quotes does not seem to work, even when the quotes are escaped.

Try this:
$regex = [regex] '(?<=(?<!END-)RQ)(.*?)(?=END-RQ)'

you should download this application:
http://www.sellsbrothers.com/posts/Details/12425
it is priceless when trying to debug regex.

This might work (hard to say without knowing exactly what your data is):
$regex = [regex]'(?<=(?:^|[^-])RQ)(.*?)(?=END-RQ)'

Related

how can I use unnamed Regex groups in C# inside my regex?

hey so my current regex is #"(into)(to)add\s[^\s]{1,}\1|\2[^\s]{1,}" I want the input to be something "add word into/to category" the regex in general works fine but just the \1|\2 part, I tried using groups and all sorts of solutions but I just can't seem to figure out how I can make it so that the input can be into or to
Can anyone help me out? (this is in C# and using the Regex class)
If I have understood you correctly, then you don't need back references to (unnamed) Groups, you can use a simple alternation, like this:
#"add \w+ (into|to) \w+"
That will select either into or to in the search string.
Edit:
Let's get a Little more 'advanced', using the optional sign '?':
#"add \w+ (in)?to \w+"
This will match 'in' zero or one time, followed by 'to', so it will match into as well as to, exactly as the original RegEx.
Edit2:
I have a feeling, you want to use a variable inside your RegEx, you can of course do that like this:
string search = "into|to";
RegEx regEx = new ReqEx(#"add \w+ (" + search + ") \w+");
From your given example I think you're looking for a regex like add\s\w+\s(into|to)\s\w+. Your current regex matches only strings starting with "intoto" wich is probably not what you want.

Regex Replace with forward slash

Can someone please explain how to get this regex working? I'm trying to take this string:
"Test0/1"
and turn it into:
"Test0\/1"
I'm using this, but it is not working:
var test = Regex.Replace("Test0/1", #"/", #"\/");
It keeps giving me
"Test0\\/1"
Then I want to take the results of the string and put it into a Regex statement like so:
var match = new Regex(test).Match(myString);
So the string 'test' has to be a valid regex statement.
Basically what I'm trying to do is take a list of interfaces off a device, create a regex statement out of them and then use that regex to compare results for other things in my code. Because of the way interfaces are formatted "FastEthernet0/1" for example, it is causing my regex to fail because you have to escape all forward slashes. I have to build this regex on the fly though because every device will have a different set of interfaces.
This is a function of Visual Studio automatically escaping the \ on your behalf. Look at the following question: What's the use/meaning of the # character in variable names in C#?. Removing the # symbol from #"\" turns the string into "\\".

parsing tweet text with regex

Regex-noob here. Looking for some C# regex code to "syntax highlight" twitter text. So given this tweet:
#taglius here's some tweet text that shouldn't be highlighted #tagtestpix http://aurl.jpg
I want to find the user mentions (#), hashtags (#), and urls (http://) and add appropriate html to color highlight these elements. Something like
<font color=red>#taglius</font> here's some tweet text that shouldn't be highlighted <font color=blue>#tagtestpix</font> <font color=yellow>http://aurl.jpg</font>
This isn't the exact html I will use, but I think you get the idea.
The answers above are parts of the whole answer, so I think I can add a little extra to answer your question:
Your highlight function would look something like this:
public static String HighlightTwitter(String input)
{
String result = Regex.Replace(input, #"\b\#\w+", #"<font color=""red"">$0</font>");
result = Regex.Replace(result, #"\b#\w+", #"<font color=""blue"">$0</font");
result = Regex.Replace(result, #"\bhttps?://[-\w]+(\.\w[-\w]*)+(:\d+)?(/[^.!,?;""\'<>()\[\]\{\}\s\x7F-\xFF]*([.!,?]+[^.!,?;""\'<>\(\)\[\]\{\}\s\x7F-\xFF]+)*)?\b", #"<font color=""yellow"">$0</font", RegexOptions.IgnoreCase);
return result;
}
I have include \b to make sure that # and # is the start of the word and make sure that urls stands alone. This means that #this_will_highlight but#this_will_not.
If performance might be an issue you can make the Regex'es as static members with RegexOptions.Compiled
E.g.:
private static Regex regexAt = new Regex(#"\b\#\w+", RegexOptions.Compiled);
...
String result = regexAt.Replace(input, #"<font color=""red"">$0</font>");
...
The following would match the '#' character followed by a sequence of alpha-num characters:
#\w+
The following would match the '#' character followed by a sequence of alpha-num characters:
\#\w+
There are a lot of free-form http url match expressions, this is the one I use most commonly:
https?://[-\w]+(\.\w[-\w]*)+(:\d+)?(/[^.!,?;""\'<>()\[\]\{\}\s\x7F-\xFF]*([.!,?]+[^.!,?;""\'<>\(\)\[\]\{\}\s\x7F-\xFF]+)*)?
Lastly, You're going to get false positive hits with all of these so you're going to need to look real hard at how to correctly delineate these tags... For instance you have the following tweet:
the url http://Roger#example.com/#bookmark is interesting.
Obviously this is going to be a problem as all three of the expressions will match inside the url. To avoid this you will need to figure out what characters are allowed to precede or follow the match. As an example, the following requires a whitespace or start of string to precede the #name reference and requires a ',' or space following it.
(?<=[^\s])#\w+(?=[,\s])
Regex patterns are not easy, I recommend getting a tool like Expresso.
You can parse out the # replies using (\#\w+). You can parse out the hash tags using (#\w+).

Regex : replace a string

I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!
There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");
Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#
Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.
Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM

Pulling data out of quotes?

I'm looking for a regex that can pull out quoted sections in a string, both single and double quotes.
IE:
"This is 'an example', \"of an input string\""
Matches:
an example
of an input string
I wrote up this:
[\"|'][A-Za-z0-9\\W]+[\"|']
It works but does anyone see any flaws with it?
EDIT: The main issue I see is that it can't handle nested quotes.
How does it handle single quotes inside of double quotes (or vice versa)?
"This is 'an example', \"of 'quotes within quotes'\""
should match
an example
of 'quotes within quotes'
Use a backreference if you need to support this.
(\"|')[A-Za-z0-9\\W]+?\1
EDIT: Fixed to use a reluctant quantifier.
Like that?
"([\"'])(.*?)\1"
Your desired match would be in sub group 2, and the kind of quote in group one.
The flaw in your regex is 1) the greedy "+" and 2) [A-Za-z0-9] is not really matching an awful lot. Many characters are not in that range.
It works but doesn't match other characters in quotes (e.g., non-alphanumeric, like binary or foreign language chars). How about this:
[\"']([^\"']*)[\"']
My C# regex is a little rusty so go easy on me if that's not exactly right :)
#"(\"|')(.*?)\1"
You might already have one of these, but, in case not, here's a free, open source tool I use all the time to test my regular expressions. I typically have the general idea of what the expression should look like, but need to fiddle around with some of the particulars.
http://renschler.net/RegexBuilder/

Categories

Resources