Matching only first occurrence in C# regex - c#

I want to match only first Hash="" in this.
Hash="123"Hash="AEBB1247209BC9E10EA2054F1813DFD7BB9EEF23FEF7C867FCFCEC69CA0C2A6D"Hash="1"
I tried with regex (Hash="[0-9,A-F]+")?
But it always matches all 3 Hash="". Is there any way to fix this? I am using C# Regex library.

Just use Match(), not Matches().

The correct answer would be to use regex.match() instead of regex.matches().
The first one returns the first match only, the second one returns all matches.
However, there is a regexp that does the job and returns only the first one, whatever you call match() or matches() :
(?<!Hash="[0-9A-F]+".*)Hash="[0-9A-F]+"
Note : this works in C#, but won't work in most other languages (like Java)
(?<!Hash="[0-9A-F]+".*) is a negative lookahead : it should not be possible to match Hash="[0-9A-F]+".* on the left side of current position (the part already read). In other words, it only matches the first Hash.
Two things regarding your current regex : (Hash="[0-9,A-F]+")?
The , between 0-9 and A-F is not a separator between the 2 ranges. It just adds , in the valid character set, which is probably not you goal (you'll accept hashes like 0123,456)
The final ? indicates that the whole group is optional. So, if it cannot be matched, it's still a success. As a consequence, strings like abcd will match (and matches() will return 5 matches of length 0: before a, between a and b, between b and c, etc...)

You can use the below regex to achieve your functionality:
(Hash="([0-9,A-F]+)[^"]*).*"
Explanation:
1st Capturing Group (Hash="([0-9,A-F]+)[^"]*)
Hash=" matches the characters Hash=" literally (case sensitive)
2nd Capturing Group ([0-9,A-F]+):
Match a single character present in the list below [0-9,A-F]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
, matches the character , literally (case sensitive)
A-F a single character in the range between A (index 65) and F (index 70) (case sensitive)
Match a single character not present in the list below [^"]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
" matches the character " literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
" matches the character " literally (case sensitive)
Working example: https://regex101.com/r/XTqhyU/2

Related

C# regex string that is not another string

I want to match an at least 3 letter word, preceded by any character from class [-_ :] any amount of times, that is not this specific 3 letter word string2.
Ex:
if string2="VER"
in
" ODO VER7"
matched " ODO"
or
"_::ATTPQ VER7"
matched "_::ATTPQ"
but if
" VER7"
it shoudn't match " VER"
so I thought about
Regex.Match(inputString, #"[-_:]*[A-Z]{3,}[^(VER)]", RegexOptions.IgnoreCase);
where
[-_:]* checks for any character in class, appearing 0 or more times
[A-Z] the range of letters that could form the word
{3,} the minimum amount of letters to form the word
[^(VER)] the grouping construct that shouldn't appear
I believe however that [A-Z]{3,} results in any letter at least 3 times (not what i want)
and [^(VER)] not sure what it's doing
Using [^(VER)] means a negated character class where you would match any character except ( ) V E or R
For you example data, you could match 0+ spaces or tabs (or use \s to also match a newline).
Then use a negative lookahead before matching 3 or more times A-Z to assert what is on the right is not VER.
If that is the case, match 3 or more times A-Z followed by a space and VER itself.
^[ \t]*[-_:]*(?!VER)[A-Z]{3,} VER
Regex demo
^\s*[-_:]*(?!VER)[A-Z]{3,}
This regex asserts that between the start and end of the string, there's zero or more of your characters, followed by at least 3 letters. It uses a negative lookahead to make sure that VER (or whatever you want) is not present.
Demo
This would match the preceding class characters [-_ :] of 3 or more letters/numbers
that do not start with VER (as in the samples given) :
[-_ :]+(?!VER)[^\W_]{3,}
https://regex101.com/r/wLw23I/1

Regular Expression to Validate if it is not a Decimal or Integer Number

I am trying to find a regular expression to find out the number which is not entered as proper decimal or integer number in a input box
Examples
1.. - Catch // consecutive Repeating dots
ABC - Catch // All Alphabets
1.1.1- Catch // dots repeating in a number
!,#,#- Catch // All Special Characters
My current below allow me to catch all examples except example -3 where decimal dots can be repeated in any combination.
void T1_HTextChanged(object sender, EventArgs e)
{
if (System.Text.RegularExpressions.Regex.IsMatch(T1_H.Text, "[^0-9.-]+|[.]{2}"))
{
MessageBox.Show("Please enter only numbers.");
T1_H.Text="";
}
}
If you really want to use a regexp you can use: ^[0-9]+(\.[0-9]+)?$.
You can test it here https://regex101.com/r/UB6eRT/1
If you want to know if it's a valid number you can also try to convert it and check if you get an error.
Try this regex:
^[0-9]+([.][0-9]{1,2})?$
Explanation:
^ asserts position at start of a line
Match a single character present in the list below [0-9]+
+ Quantifier — Matches between one and unlimited times, as many times as possible,
giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
1st Capturing Group ([.][0-9]{1,2})?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [.]
. matches the character . literally (case sensitive)
Match a single character present in the list below [0-9]{1,2}
{1,2} Quantifier — Matches between 1 and 2 times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
$ asserts position at the end of a line
Working example: https://regex101.com/r/iRaRPX/1/
It will check for all integers and decimal numbers up to two decimal points. You can change that according to your requirement.
If you want to achieve this with Regular expression, you can use.
^(\d*\.)?\d+$
Demo
But please be aware that you can use Decimal.TryParse as well. You can read more on Decimal.TryParse here

Regular expression, match anything not enclosed in

Given the string foobarbarbarfoobar, I want to have everything between foo. So I used this expression for that and the result is: barbarbar. It's working great.
(?<=foo).*(?=foo)
Now I also want the opposite. So given the string foobarbarbarfoobar I want everything that is not enclosed by foo. I tried the following regular expression:
(?<!foo).*(?!foo)
I expected bar as result but instead it returns a match for foobarbarbarfoobar. It doesn't make sense to me. What am I missing?
The explanation from: https://regex101.com/ looks good to me?
(?<!foo).*(?!foo)
(?<!foo) Negative Lookbehind - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?!foo) Negative Lookahead - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
Any help is really appreciated
I'm hoping someone finds a better approach, but this abomination may do what you want:
(.*)foo(?<=foo).*(?=foo)foo(.*)
The text before the first foo is in capture group 1 (with your provided example this would be empty) and after is in capture group 2 (would be 'bar' in this case)
If you want the 'foo's included on either end, use this instead: (.*)(?<=foo).*(?=foo)(.*). This would result in 'foo' in group 1, and 'foobar' in group 2.
I found a solution for it:
^((?!foo).)+
Explanation from regex101
^ assert position at start of the string
1st Capturing group ((?!foo).)+
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
(?!foo) Negative Lookahead - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
. matches any character (except newline)

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

What regex for matching words with keyword '('?

In my c# code I need to get a word if the words before match specific words:
var match= Regex.Match(someLine, #"^(FIRST WORDS) (\w+) (SECOND WORDS | PROBLEM KEYWORD \() (\w+)", RegexOptions.IgnoreCase);
var neededWord= match.Groups[4].Value;
If the string equals "FIRST WORDS SOME WORDS PROBLEM KEYWORD (SOMETHING AGAIN)", I would like to get 'SOMETHING' as my needed word. But this does not work. It returns an empty string.
What am I doing wrong?
RegEx Demo
^FIRST WORDS[^\(]+\(([^\)]+)\)
Debuggex Demo
Description
^ assert position at start of the string
FIRST WORDS matches the characters FIRST WORDS literally (case sensitive)
[^\(]+ match a single character not present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\( matches the character ( literally
\( matches the character ( literally
1st Capturing group ([^\)]+)
[^\)]+ match a single character not present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\) matches the character ) literally
\) matches the character ) literally
Note: if you need only the word SOMETHING I can edit the RegEx, also Group 1 will contain your requested results.

Categories

Resources