What regex for matching words with keyword '('? - c#

In my c# code I need to get a word if the words before match specific words:
var match= Regex.Match(someLine, #"^(FIRST WORDS) (\w+) (SECOND WORDS | PROBLEM KEYWORD \() (\w+)", RegexOptions.IgnoreCase);
var neededWord= match.Groups[4].Value;
If the string equals "FIRST WORDS SOME WORDS PROBLEM KEYWORD (SOMETHING AGAIN)", I would like to get 'SOMETHING' as my needed word. But this does not work. It returns an empty string.
What am I doing wrong?

RegEx Demo
^FIRST WORDS[^\(]+\(([^\)]+)\)
Debuggex Demo
Description
^ assert position at start of the string
FIRST WORDS matches the characters FIRST WORDS literally (case sensitive)
[^\(]+ match a single character not present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\( matches the character ( literally
\( matches the character ( literally
1st Capturing group ([^\)]+)
[^\)]+ match a single character not present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\) matches the character ) literally
\) matches the character ) literally
Note: if you need only the word SOMETHING I can edit the RegEx, also Group 1 will contain your requested results.

Related

Regex for football teams representing two participants playing against each other

My task is to have :
EventName is a string representing two participants playing against
each other.
I was thinking how can I do this with Data Annotation. So far I have this:
[RegularExpression("^[^0-9]+$", ErrorMessage = "Name cannot contain numbers")]
[Required, MinLength(10), MaxLength(150)]
I think I should only allow 1 delimiter( either "-" ":" or space) between those two teams. Anything else should not be allowed. No special characters, no numbers.
Could someone point me to the right direction?
Also does anyone know if football teams can have digits in their names?
Something like Villa1874 - Levski1914 ?
To allow a single delimiter (:,- or single white-space, any can be surrounded by white-spaces) and numbers/underscores in the team names:
^\w+ *[ -:] *\w+$
Explanation:
^ asserts position at start of a line
\w+ matches any word character (equal to [a-zA-Z0-9_]) (1 or more)
* matches any white-space (0 or more)
[ -:] Match a single character present in the list
* matches any white-space (0 or more)
\w+ matches any word character (equal to [a-zA-Z0-9_]) (1 or more)
^ asserts position at end of a line
Test: https://regex101.com/r/gBZDLE/6
If you don't want to allow numbers or underscores in the team names you can replace \w by [A-Za-z] instead:
^[A-Za-z]+ *[ -:] *[A-Za-z]+$
Test: https://regex101.com/r/gBZDLE/7
The previous doesn't allow spaces in the team names. To do so, you may want to match [A-Za-z]+(?: [A-Za-z]+)* instead. This will match any name that starts with an alphabetic character and contain one or more words separated by single-spaces (e.g. "Manchester" or "Manchester United"). Full regex:
^[A-Za-z]+(?: [A-Za-z]+)* *[ -:] *[A-Za-z]+(?: [A-Za-z]+)*$
Test: https://regex101.com/r/gBZDLE/5
Explanation of [A-Za-z]+(?: [A-Za-z]+):
[A-Za-z]+ Match a single alphabetic character (+ one or more times)
(?: [A-Za-z]+) Non-capture group (?: ), matches a single white-space followed by an alphabetic character one or more times [A-Za-z]+

C# regex string that is not another string

I want to match an at least 3 letter word, preceded by any character from class [-_ :] any amount of times, that is not this specific 3 letter word string2.
Ex:
if string2="VER"
in
" ODO VER7"
matched " ODO"
or
"_::ATTPQ VER7"
matched "_::ATTPQ"
but if
" VER7"
it shoudn't match " VER"
so I thought about
Regex.Match(inputString, #"[-_:]*[A-Z]{3,}[^(VER)]", RegexOptions.IgnoreCase);
where
[-_:]* checks for any character in class, appearing 0 or more times
[A-Z] the range of letters that could form the word
{3,} the minimum amount of letters to form the word
[^(VER)] the grouping construct that shouldn't appear
I believe however that [A-Z]{3,} results in any letter at least 3 times (not what i want)
and [^(VER)] not sure what it's doing
Using [^(VER)] means a negated character class where you would match any character except ( ) V E or R
For you example data, you could match 0+ spaces or tabs (or use \s to also match a newline).
Then use a negative lookahead before matching 3 or more times A-Z to assert what is on the right is not VER.
If that is the case, match 3 or more times A-Z followed by a space and VER itself.
^[ \t]*[-_:]*(?!VER)[A-Z]{3,} VER
Regex demo
^\s*[-_:]*(?!VER)[A-Z]{3,}
This regex asserts that between the start and end of the string, there's zero or more of your characters, followed by at least 3 letters. It uses a negative lookahead to make sure that VER (or whatever you want) is not present.
Demo
This would match the preceding class characters [-_ :] of 3 or more letters/numbers
that do not start with VER (as in the samples given) :
[-_ :]+(?!VER)[^\W_]{3,}
https://regex101.com/r/wLw23I/1

Regex to return the word before the match

I've been trying to extract the word before the match. For example, I have the following sentence:
"Allatoona was a town located in extreme southeastern Bartow County, Georgia."
I want to extract the word before "Bartow".
I've tried the following regex to extract that word:
\w\sCounty,
What I get returned is "w County" when what I wanted is just the word Bartow.
Any assistance would be greatly appreciated. Thanks!
You can use this regex with a lookahead to find word before County:
\w+(?=\s+County)
(?=\s+County) is a positive lookahead that asserts presence of 1 or more whitespaces followed by word County ahead of current match.
RegEx Demo
If you want to avoid lookahead then you can use a capture group:
(\w+)\s+County
and extract captured group #1 from match result.
Your \w\sCounty, regex returns w County because \w matches a single character that is either a letter, digit, or _. It does not match a whole word.
To match 1 or more symbols, you need to use a + quantifier and to capture the part you need to extract you can rely on capturing groups, (...).
So, you can fix your pattern by mere replacing \w with (\w+) and then, after getting a match, access the Match.Groups[1].Value.
However, if the county name contains a non-word symbol, like a hyphen, \w+ won't match it. A \S+ matching 1 or more non-whitespace symbols might turn out a better option in that case.
See a C# demo:
var m = Regex.Match(s, #"(\S+)\s+County");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
See a regex demo.
You can use this regex to find the word before Country
([\w]*.?\s+).?County
The [\w]* match any characters any times
the .? is if maybe there is a especial character in the sentences like (,.!)
and the \s+ for the banks spaces ( work if there is a double blank space in the sentence)
.? before Country if maybe a special character is placed there
If you want to find more than one word just add {n} after like this ([\w]*.?\s+){3}.?County

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

Using Regular Expression Match a String that contains numbers letters and dashes

I need to match this string 011Q-0SH3-936729 but not 345376346 or asfsdfgsfsdf
It has to contain characters AND numbers AND dashes
Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 and I want it to be able to match anyone of those. Reason for this is that I don't really know if the format is fixed and I have no way of finding out either so I need to come up with a generic solution for a pattern with any number of dashes and the pattern recurring any number of times.
Sorry this is probably a stupid question, but I really suck at Regular expressions.
TIA
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*\p{L}) # Assert that there is at least one letter
(?=.*\p{N}) # and at least one digit
(?=.*-) # and at least one dash.
[\p{L}\p{N}-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace);
should do what you want. If by letters/digits you meant "only ASCII letters/digits" (and not international/Unicode letters, too), then use
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*[A-Z]) # Assert that there is at least one letter
(?=.*[0-9]) # and at least one digit
(?=.*-) # and at least one dash.
[A-Z0-9-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
EDIT:
this will match any of the key provided in your comments:
^[0-9A-Z]+(-[0-9A-Z]+)+$
this means the key starts with the digit or letter and have at leats one dash symbol:
Without more info about the regularity of the dashes or otherwise, this is the best we can do:
Regex.IsMatch(input,#"[A-Z0-9\-]+\-[A-Z0-9]")
Although this will also match -A-0
Most naive implementation EVER (might get you started):
([0-9]|[A-Z])+(-)([0-9]|[A-Z])+(-)([0-9]|[A-Z])+
Tested with Regex Coach.
EDIT:
That does match only three groups; here another, slightly better:
([0-9A-Z]+\-)+([0-9A-Z]+)
Are you applying the regex to a whole string (i.e., validating or filtering)? If so, Tim's answer should put you right. But if you're plucking matches from a larger string, it gets a bit more complicated. Here's how I would do that:
string input = #"Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 but not 345-3763-46 or ASFS-DFGS-FSDF or ASD123FGH987.";
Regex pluckingRegex = new Regex(
#"(?<!\S) # start of 'word'
(?=\S*\p{L}) # contains a letter
(?=\S*\p{N}) # contains a digit
(?=\S*-) # contains a hyphen
[\p{L}\p{N}-]+ # gobble up letters, digits and hyphens only
(?!\S) # end of 'word'
", RegexOptions.IgnorePatternWhitespace);
foreach (Match m in pluckingRegex.Matches(input))
{
Console.WriteLine(m.Value);
}
output: 011Q-0SH3-936729
011Q-0SH3-936729-SDF3
000-222-AAAA
011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729
The negative lookarounds serve as 'word' boundaries: they insure the matched substring starts either at the beginning of the string or after a whitespace character ((?<!\S)), and ends either at the end of the string or before a whitespace character ((?!\S)).
The three positive lookaheads work just like Tim's, except they use \S* to skip whatever precedes the first letter/digit/hyphen. We can't use .* in this case because that would allow it to skip to the next word, or the next, etc., defeating the purpose of the lookahead.

Categories

Resources