Regex - Get strings in Quotes ignore escaped Quotes and Comments

Regex - Get strings in Quotes ignore escaped Quotes and Comments - c#

I need help with my regex. I use C#.
I need an regex with matches all strings which are within quotes, but i need to ignore the escaped quotes in the string and the strings which are in comment like this.
// "Hello Guys" -> Ignore string
SayHello("Hello i \"need\" our help"); -> match whole string.
The regex i current use is this: Demo regex

We can use negative lookbehind if you know exacly the length of character before comment with string. Because negative lookbehind cant use quantifier. Something like this :
(?<!\/\/.)".*?[^\\]"
Or do this. Remove all comment that use // with this regex
\/\/.*
then use this to get all strings
".*?[^\\]"

Ecluding stuff (dealing with complements) is not what regular expressions are good for (not counting some more or less exotic extensions, but the regular expressions in the spirit of automata theory and formal languages).
The // comments would need to get filtered out or replaced by harmless content in its own pass.
The \" escapes would need a similar treatment.
Then you could match the defused content with some regular expression.

Related

Regular Expression Space character not working

My Regex is for a canadian postal code and only allowing the valid letters:
Regex pattern = new Regex("^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ][/s][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$");
The problem I am having is that I want to allow for a space to be put in between the each set but cannot find the correct character to use.

You've got a forward-slash instead of a backslash in your regular expression for whitespace (\s). The following regex should work.
#"^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ][\s][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$"

If you are simply searching for space use \s
To provide the escape sequence character \ use # verbitm literal character as below in the given example.
Regex pattern = new Regex(#"^[ABCEGHJKLMNPRSTVXY][0-9]\s[ABCEGHJKLMNPRSTVWXYZ[0-9]\s[ABCEGHJKLMNPRSTVWXYZ][0-9]$");
As pointed out in the comments, if space is optional you can use ? quantifier as below.
Regex pattern = new Regex(#"^[ABCEGHJKLMNPRSTVXY][0-9]\s?[ABCEGHJKLMNPRSTVWXYZ[0-9]\s?[ABCEGHJKLMNPRSTVWXYZ][0-9]$");

Use the \s token for whitespace instead of /s.
Some handy tools to speed up regex development:
regexr.com helps with syntax and provides realtime testing
regexpr.com (yes I know :)) visualizes your expression.

As per other answers....
Use \s instead of /s
You shouldn't need to square bracket the [\s], because it already implies a complete class of characters.
Also...
In most languages, you probably don't want to use double quotes "..." as delimiters to the Regex, since this might be interpolating the \s before the pattern is applied. It's certainly worth a try.
Use a trailing quantifier \s* or \s? to allow the space to be optional.

Do I need to escape match substitutions in my .NET Regex pattern?

I am converting some code from Perl into .NET. I have this s/// pattern replacement:
$text =~ s/\Q\1\E"/$1/gi;
I use the Perl \Q and \E to escape the text I am matching in the first capture, the URL portion of an HREF attribute.
How do I do the same thing in .NET? I've read about using Regex.Escape() to escape text for a regular expression, but how can I use it WITHIN the regular expression that is performing the match? (Or do I even need to do that?)
So right now I'm not doing anything special, and wondering if this will continue to work as well as my Perl regex has been:
text = Regex.Replace(text, #"\1", "$1", RegexOptions.IgnoreCase);

You don't need the \Q...\E in the Perl. In fact, it will keep your Perl from working, because it means the '\1' is taken literally instead of being replaced with the contents of the first match.
\Q...\E is for quoting characters that are special in a regex. The string replaced with a backreference is already taken literally, not re-evaluated as regular expression syntax.
The same goes for anything inserted into the replacement string. You only need \Q...\E, or Regex.Escape(), if you have a variable from outside of regex-land with text that you want to interpolate into a regular expression, without accidentally treating parts of it as a regular expression metacharacters.

Regex : replace a string

I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!

There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");

Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#

Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.

Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM

'-' not working while using Regular Expressions to match special characters, c#

Pattern is
Regex splRegExp = new System.Text.RegularExpressions.Regex(#"[\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,-,_]");
All characters work except '-'. Please advise.

Use
#"[,#+\\?\d%.*&^$(!)#_-]"
No need for all those commas.
If you place a - inside a character class, it means a literal dash only if it's at the start or end of the class. Otherwise it denotes a range like A-Z. As Damien put it, the range ,-, is indeed rather small (and doesn't contain the -, of course).

'-' has to be the first charater in your regex.
Regex splRegExp = new System.Text.RegularExpressions.Regex(#"[-,\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,_]");

You need to escape the -character for it to work (it's a regular expression syntax)
Try this:
"[\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,\-,_]"

regular expression should split , that are contained outside the double quotes in a CSV file?

This is the sample
"abc","abcsds","adbc,ds","abc"
Output should be
abc
abcsds
adbc,ds
abc

Try this:
"(.*?)"
if you need to put this regex inside a literal, don't forget to escape it:
Regex re = new Regex("\"(.*?)\"");

This is a tougher job than you realize -- not only can there be commas inside the quotes, but there can also be quotes inside the quotes. Two consecutive quotes inside of a quoted string does not signal the end of the string. Instead, it signals a quote embedded in the string, so for example:
"x", "y,""z"""
should be parsed as:
x
y,"z"
So, the basic sequence is something like this:
Find the first non-white-space character.
If it was a quote, read up to the next quote. Then read the next character.
Repeat until that next character is not also a quote.
If the next (non-whitespace) character is not a comma, input is malformed.
If it was not a quote, read up to the next comma.
Skip the comma, repeat the whole process for the next field.
Note that despite the tag, I'm not providing a regex -- I'm not at all sure I've seen a regex that can really handle this properly.

This answer has a C# solution for dealing with CSV.
In particular, the line
private static Regex rexCsvSplitter = new Regex( #",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
contains the Regex used to split properly, i.e., taking quoting and escaping into consideration.
Basically what it says is, match any comma that is followed by an even number of quote marks (including zero). This effectively prevents matching a comma that is part of a quoted string, since the quote character is escaped by doubling it.
Keep in mind that the quotes in the above line are doubled for the sake of the string literal. It might be easier to think of the expression as
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))

If you can be sure there are no inner, escaped quotes, then I guess it's ok to use a regular expression for this. However, most modern languages already have proper CSV parsers.
Use a proper parser is the correct answer to this. Text::CSV for Perl, for example.
However, if you're dead set on using regular expressions, I'd suggest you "borrow" from some sort of module, like this one:
http://metacpan.org/pod/Regexp::Common::balanced

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex - Get strings in Quotes ignore escaped Quotes and Comments - c#

Related

Regular Expression Space character not working

Do I need to escape match substitutions in my .NET Regex pattern?

Regex : replace a string

'-' not working while using Regular Expressions to match special characters, c#

regular expression should split , that are contained outside the double quotes in a CSV file?

Categories

Resources