(Regular Expression) To check start with specified text and next numeric - c#

I'd like to check start with "Text" and next is numeric.
I want this with Regex but can't make it well.
below is example and I only want to get
"Test2018.txt", "Test2019.txt"
List<string> fileNames = new List<string>() {"Test2018.txt", "Test2019.txt", "TestEvent2018.txt", "TestEvent2019.txt"};
fileNames.Where(p => Regex.IsMatch(p, "Test^[0-9]+*") == true);

You could use this Regex:
^Test[0-9]+\.txt$
Where
^ denotes the start of the line.
Test matches the literal text.
[0-9] matches any numeric digit 0-9.
+ means at least once.
\. matches the period.
txt matches the literal text.
$ denotes the end of the line.
And in C#:
var matchingFiles = fileNames.Where(p => Regex.IsMatch(p, #"^Test[0-9]+\.txt$"));

^ matches the start of the string, so it doesn’t make sense for it to go in the middle of your pattern. I think you also meant .* by *, but there’s no need to check the rest of the string when checking for IsMatch without an ending anchor. (That means [0-9]+ can also become [0-9].)
No need to use == true on booleans, either.
fileNames.Where(p => Regex.IsMatch(p, "^Test[0-9]"))

Related

Regex string between two strings includes brakets

I struggle with some regex expression.
Test string is e.g.
Path.parent.child[0]
I need to extract the "parent"
I always know the start "Path" and end "child[0]" of the test string. End can also be "child[1]" or without "[]"
I try it with: (?<=Path.).*?(?=.child[0])
But it not find a match. I think the [] in the child string is the problem.
A regex which might work is ^[^.]*\.(.*)\.[^.]*\[?\d?\]?$
^ beginning of the line
[^.]* anything except a dot
\. a literal dot
(.*) anything, captured as a group
\. a literal dot again
[^.]* anything except a dot
\[?\d?\]? optional brackets and index. Could be (\[\d\])? if you don't mind another group.
$ end of the line
Check it on Regex101.com
But, if you always know the start and the end, you could switch to a non-Regex approach:
var start = "Path";
var end = "child[0]";
var s = start+".parent."+end;
// +1 for the first dot, -2 for two dots
var extracted = s.Substring(start.Length+1, s.Length-start.Length-end.Length-2);
Console.WriteLine(extracted);

How to match words that doesn't start nor end with certain characters using Regex?

I want to find word matches that doesn't start nor end with some specific characters.
For example, I have this input and I only want to match the highlighted word:
"string" string 'string'
And exclude other words that start and end with either " or '.
I am currently using this pattern:
But I do not know what pattern I should use that would exclude words that start and end with some specified characters.
Can some one give me some advice on what pattern I should use? Thank you
The pattern you're currently using matches since \b properly asserts the positions between "s and g" (a position between a word character [a-zA-Z0-9_] and a non-word character). You can use one of the following methods:
Negate specific characters (negative lookbehind/lookahead)
This method allows you to specify a character, set of characters, or substring to negate from a match.
(?<!['"])\bstring\b(?!['"]) - see it in use here
(?<!['"]) - ensure " doesn't precede.
(?!['"]) - ensure " doesn't proceeds.
Allow specific characters (positive lookbehind/lookahead)
This method allows you to specify a character, set of characters, or substring to ensure match.
(?<=\s|^)\bstring\b(?=\s|$) - see it in use here
(?<=\s|^) - ensure whitespace or the beginning of the line precedes.
(?=\s|$) - ensure whitespace or the end of the line proceeds.
A combination of both above
This method allows you to negate specific cases while allowing others (not commonly used and not really needed for the problem presented, but may be useful to you or others.
Something like (?<=\s|^)string(?!\s+(?!stop)|$) would ensure the word isn't followed by the word stop
Something like (?<=(?<!stop\s*)\s+|^)string(?=\s+|$) would ensure the word doesn't follow the word stop - note that quantifiers (\s+) in lookbehinds are not allowed in most regex engines, .NET allows it.
Something like (?<=\s|^)\bstring\b(?=\s|$)(?!\z) would ensure a the word isn't at the end of the string (different from end of line if multi-line).
This regex will pick string if it is between spaces: \sstring\s
var sample = "\"string\" string \"string\" astring 'string_ string?string string ";
var regx = new Regex(#"\sstring\s");
var matches = regx.Matches(sample);
foreach (Match mt in matches)
{
Console.WriteLine($"{mt.Value} {mt.Index,3} {mt.Length,3}");
}

How can I filter out certain combinations?

I'm trying to filter the input of a TextBox using a Regex. I need up to three numbers before the decimal point and I need two after it. This can be in any form.
I've tried changing the regex commands around, but it creates errors and single inputs won't be valid. I'm using a TextBox in WPF to collect the data.
bool containsLetter = Regex.IsMatch(units.Text, "^[0-9]{1,3}([.] [0-9] {1,3})?$");
if (containsLetter == true)
{
MessageBox.Show("error");
}
return containsLetter;
I want the regex filter to accept these types of inputs:
111.11,
11.11,
1.11,
1.01,
100,
10,
1,
As it has been mentioned in the comment, spaces are characters that will be interpreted literally in your regex pattern.
Therefore in this part of your regex:
([.] [0-9] {1,3})
a space is expected between . and [0-9],
the same goes for after [0-9] where the regex would match 1 to 3 spaces.
This being said, for readability purpose you have several way to construct your regex.
1) Put the comments out of the regex:
string myregex = #"\s" // Match any whitespace once
+ #"\n" // Match one newline character
+ #"[a-zA-Z]"; // Match any letter
2) Add comments within your regex by using the syntax (?#comment)
needle(?# this will find a needle)
Example
3) Activate free-spacing mode within your regex:
nee # this will find a nee...
dle # ...dle (the split means nothing when white-space is ignored)
doc: https://www.regular-expressions.info/freespacing.html
Example

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?
Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.
Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

Categories

Resources