I am trying to write a regular expression that will match a string that contains name-value pairs of the form:
<name> = <value>, <name> = <value>, ...
Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:
regex = new Regex(fieldName + #"\s*=\s*""(.*?)""");
This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:
difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"
I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)
Try this to capture the key and value:
(\w+)\s*=\s*(#"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")
As a bonus, it also works on verbatim strings.
C# Examples:https://dotnetfiddle.net/vQP4rn
Here's an annotated version:
string pattern = #"
(\w+)\s*=\s* # key =
( # Capturing group for the string
#"" # verbatim string - match literal at-sign and a quote
(?:
[^""]|"""" # match a non-quote character, or two quotes
)* # zero times or more
"" #literal quote
| #OR - regular string
"" # string literal - opening quote
(?:
\\. # match an escaped character,
|[^\\""] # or a character that isn't a quote or a backslash
)* # a few times
"" # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern,
RegexOptions.IgnorePatternWhitespace);
Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.
This should match only the string literal part (you can tack on whatever else you want to the beginning/end):
Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");
and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):
Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");
You can use this:
#" \s* = \s* (?<!\\)"" (.* ) (?<!\\)"""
It's almost like yours, but instead of using "", I used (?<!\\)"" to match when suffix \ is not present, so it won't match escaped quotes.
Related
I have these examples:
{I18n.get("Testing 123...")}
{I18n.get('Testing 123...')}
{I18n.get(
"Testing 123..."
)}
{I18n.get("Testing 123..."
)}
{I18n.get(
"Testing 123...")}
I want to extract the 'Testing 123...' in .Net using C# Regex. What I did was:
Regex r = new Regex(#"(?:I18n.get\(""(.+?)""\))", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var matches = r.Matches(txt)
.Select(xx=> xx.Groups)
.Select(xx=> xx.Last().Value)
.ToList();
When is single line works perfect, but when is multiple line it fails...
And how would be possible to match in a single Regex when the txt is with double quotes " or when is with single quotes ' ?
You may use
var r = new Regex(#"I18n\.get\(\s*(""|')(.*?)\1\s*\)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var results = r.Matches(txt).Cast<Match>().Select(x => x.Groups[2].Value).ToList();
See the regex demo.
Details
I18n\.get\( - a literal I18n.get( text
\s* - 0+ whitespaces
("|') - Group 1: " or '
(.*?) - Group 2: any char, 0 or more occurrences, as few as possible
\1 - same value as captured in Group 1
\s* - 0+ whitespaces
\) - a ) char.
I contend it doesn't matter which open/close quotes should be in the regex when you don't intend to actually parse it as a quoted string, right ?
I mean with all the embed escapes etc...
Use what you know as the delimiters, the text literals
I18n.get( stuff here )
You could use a sub validation that there's an inner quote, but since you're
not parsing quotes, it's not going to be strictly valid anyway.
Here, we just use it to trim so it's not matched and be part of element.
Here it is, the whole match is the value you're looking for, toss it into an array.
#"(?s)(?<=I18n\s*\.\s*get\s*\(\s*['""]\s*).*?(?=\s*['""]\s*\))"
My Code is like this:
string currentPageSlug = "securities/EBR#03L$ZZZ";
string patern= #"securities/(\w+)[\#\$]";
string res = Regex.Match(currentPageSlug, patern).Value;
Console.WriteLine(res);
which gives me this result:
securities/EBR#
but I want to get:
securities/EBR#03L$ZZZ
whole word including all special characters (# and $ and maybe others too)
my regex pattern does not seem to work.
Your regex matches words followed by a single special character. You need to include [#$] in the repeating construct +, like this:
string patern= #"securities/((?:\w|[#$])+)";
Note that since # and $ are used inside a character class, it is not necessary to escape them with a backslash \.
Let's suppose we have the following string:
#"Hello m\u00e9 name is Mat\u00bfQu"
I am using the regex:
private static readonly Regex ESCAPING_REGEX = new Regex("\\+[^\"][a-zA-Z0-9]*", RegexOptions.Compiled);
However, this regex doesn't seem to return any matches:
MatchCollection matches = ESCAPING_REGEX.Matches(text);
// matches.Count == 0
I tried the regex on Regex101 and it does return the two matches that I was looking for.
How can I fix my regular expression to achieve expected behavior? (Any tips for improvement are gladly accepted.)
Your regex declaration is faulty because you require a literal + to be in the beginning of the match. Look what your regex looks like for a regex engine:
\+ - Matches a literal +
[^"] - Matches any character other than "
[a-zA-Z0-9]* - Matches 0 or more characters that are digits or Latin letters.
If you use a verbatim string literal to create your regex, e.g.
Regex.Matches(str, #"\\+[^""][a-zA-Z0-9]*");
you'd get 2 matches. \\ in a verbatim string literal will match a literal \, and + will be treated as a quantifier.
Actually, you do not even need the + (since it will match \\\\) and [^""] (unless there can be some "s right after \ and that is not what you want to match), you can use
#"\\[a-zA-Z0-9]+"
to match your substrings (\\ matches a \, [a-zA-Z0-9]+ will match 1 or more characters from the range).
Im using C# and wanting to use the following regular expression in my code:
sDatabaseServer\s*=\s*"([^"]*)"
I have placed it in my code as:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*"([^"]*)"", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
I know you have to escape all parenthesis and quotes inside the string quotes but for some reason the following does still not work:
Working Version:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*""([^""]*)""", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
Any ideas how to get C# to see my regex as just a string? I know i know....easy question...Sorry im still somewhat of an amateur to C#...
SOLVED: Thanks guys!
You went one step too far when you escaped the parentheses. If you want them to be regex meta-characters (i.e. a capturing group), then you must not escape them. Otherwise they will match literal parentheses.
So this is probably what you are looking for:
#"sDatabaseServer\s*=\s*""([^""]*)"""
string regex = "sDatabaseServer\\s*=\\s*\"([^\"]*)\""
in your first try, you forgot to escape your quotes. But since it's a string literal, escaping with a \ doesn't work.
In y our second try, you escaped the quotes, but you didn't escape the \ that's needed for your whitespace token \s
Use \x22 instead of quotes:
string pattern = #"sDatabaseServer\s*=\s*\x22([^\x22]*)\x22";
But
Ignorepattern whitespace allows for comments in the regex pattern (the # sign) or the pattern split over multiple lines. You don't have either; remove.
A better pattern for what you seek is
string pattern =#"(?:sDatabaseServer\s*=\s*\x22)([^\x22]+)(?:\x22)";
(?: ) is match but don't capture and acts like an anchor for the parser. Also it assumes there will be at least 1 character in the quotes, so using the + instead of the *.
I have a regular expression with the following pattern in C#
Regex param = new Regex(#"^-|^/|=|:");
Basically, its for command line parsing.
If I pass the below cmd line args it spilts C: as well.
/Data:SomeData /File:"C:\Somelocation"
How do I make it to not apply to characters inside double or single quotes ?
You can do this in two steps:
Use the first regex
Regex args = new Regex("[/-](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
to split the string into the different arguments. Then use the regex
Regex param = new Regex("[=:](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
to split each of the arguments into parameter/value pairs.
Explanation:
[=:] # Split on this regex...
(?= # ...only if the following matches afterwards:
(?: # The following group...
[^"]*" # any number of non-quote character, then one quote
[^"]*" # repeat, to ensure even number of quotes
)* # ...repeated any number of times, including zero,
[^"]* # followed by any number of non-quotes
$ # until the end of the string.
) # End of lookahead.
Basically, it looks ahead in the string if there is an even number of quotes ahead. If there is, we're outside of a string. However, this (somewhat manageable) regex only handles double quotes, and only if there are no escaped quotes inside those.
The following regex handles single and double quotes, including escaped quotes, correctly. But I guess you'll agree that if anybody ever finds this in production code, I'm guaranteed a feature article on The Daily WTF:
Regex param = new Regex(
#"[=:]
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\.|""(?:\\.|[^""\\])*""|[^\\'""])*
'
(?:\\.|""(?:\\.|[^""'\\])*""|[^\\'])*
'
)*
(?:\\.|""(?:\\.|[^""\\])*""|[^\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\.|'(?:\\.|[^'\\])*'|[^\\'""])*
""
(?:\\.|'(?:\\.|[^'""\\])*'|[^\\""])*
""
)*
(?:\\.|'(?:\\.|[^'\\])*'|[^\\""])*
$
)",
RegexOptions.IgnorePatternWhitespace);
Further explanation of this monster here.
You should read "Mastering Regular Expressions" to understand why there's no general solution to your question. Regexes cannot handle that to an arbitrary depth. As soon as you start to escape the escape character or to escape the escaping of the escape character or ... you're lost. Your use case needs a parser and not a regex.