Regex.Match() won't match a substring - c#

This is something simple but I cannot figure this out. I want to find a substring with this regex. It will mach "M4N 3M5", but doesn't match the below :
const string text = "asdf M4N 3M5 adsf";
Regex regex = new Regex(#"^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$", RegexOptions.None);
Match match = regex.Match(text);
string value = match.Value;

Try removing ^ and $:
Regex regex = new Regex(#"[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}", RegexOptions.None);
^ : The match must start at the beginning of the string or line.
$ : The match must occur at the end of the string or before \n at the
end of the line or string.
If you want to match only in word boundaries you can use \b as suggested by Mike Strobel:
Regex regex = new Regex(#"\b[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}\b", RegexOptions.None);

I know this question has been answered but i have noticed two thing in your pattern which i want to highlight:
No need to mention the single instance of any token.
For example: (Notice the missing {1})
\d{1} = \d
[A-Z]{1} = [A-Z]
Also I won't recommend you to enter a <space>in your pattern use '\s' instead because if mistakenly a backspace is pressed you might not
be able to figure out the mistake and running code will stop
working.
Personally, for this case i would recommend you to use \b since it is best fit here.

Related

Regex Match for HTML string with newline

I am trying to match:
<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n
My Regex ATM is:
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)\n\n\n", RegexOptions.IgnoreCase);
However it does not work.
The (.*?) should match all in between.
Here are 2 things I find important:
Whenever you declare a regex pattern in C#, it is advisable to use string literals, i.e. #"PATTERN". This simplifies writing regex patterns.
RegexOptions.Singleline must be used to treat multiline text as a string, i.e. a dot will match a line break.
Here is my code snippet:
var str = "<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n";
var regex = Regex.Match(str, #"Manufacturer(.*?)\n\n\n",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (regex.Success)
MessageBox.Show("\"" + regex.Value + "\"");
The regex.Value is
"Manufacturer</h4>
Gigabyte
"
Best regards.
I replaced \n with another value and then Regex searched my replaced value. It is working for the time being, but it may not be the best approach. Any recommendations appreciated.
cleanedUpHtml = cleanedUpHtml.Replace("\n", "p19o9");
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)p19o9p19o9p19o9", RegexOptions.IgnoreCase);
Generally I prefere to cleanup the string from html tags and new-line characters before using the regex.
(.*?) stops capture with \n characer, you might use a more generic group instead, like ([\w|\W]*?)

Regex to match a word beginning with a period and ending with an underscore?

I'm quite the Regex novice, but I have a series of strings similar to this "[$myVar.myVar_STATE]" I need to replace the 2nd myVar that begins with a period and ends with an underscore. I need it to match it exactly, as sometimes I'll have "[$myVar.myVar_moreMyVar_STATE]" and in that case I wouldn't want to replace anything.
I've tried things like "\b.myVar_\b", "\.\bmyVar_\b" and several more, all to no luck.
How about this:
\[\$myVar\.([^_]+)_STATE\]
Matches:
[$myVar.myVar_STATE] // matches and captures 'myvar'
[$myVar.myVar_moreMyVar_STATE] // no match
Working regex example:
http://regex101.com/r/yM9jQ3
Or if _STATE was variable, you could use this: (as long as the text in the STATE part does not have underscores in it.)
\[\$myVar\.([^_]+)_[^_]+\]
Working regex example:
http://regex101.com/r/kW8oE1
Edit: Conforming to OP's comments below, This should be what he's going for:
(\[\$myVar\.)([^_]+)(_[^_]+\])
Regex replace example:
http://regex101.com/r/pU6yL8
C#
var pattern = #"(\[\$myVar\.)([^_]+)(_[^_]+\])";
var replaced = Regex.Replace(input, pattern, "$1"+ newVar + "$3")
What about something like:
.*.(myVar_).*
This looks for anything then a . and "myVar_" followed by anything.
It matches:
"[$myVar.myVar_STATE]"
And only the first myVar_ here:
"[$myVar.myVar_moremyVar_STATE]"
See it in action.
This should do it:
\[\$myVar\.(.*?)_STATE\]
You can use this little trick to pick out the groups, and build the replacement at the end, like so:
var replacement = "something";
var input = #"[$myVar.myVar_STATE]";
var pattern = #"(\[\$myVar\.)(.*?)_(.*?)]";
var replaced = Regex.Replace(input, pattern, "$1"+ replacement + "_$2]")
C# already has builtin method to do this
string text = ".asda_";
Response.Write((text.StartsWith(".") && text.EndsWith("_")));
Is Regex really required?
string input = "[$myVar.myVar_STATE]";
string oldVar = "myVar";
string newVar = "myNewVar";
string result = input.Replace("." + oldVar + "_STATE]", "." + newVar + "_STATE]");
In case "STATE" is a variable part, then we'll need to use Regex. The easiest way is to use this Regex pattern which matches a position between a prefix and a suffix. Prefix and suffix are used for searching but are not included in the resulting match:
(?<=prefix)find(?=suffix)
result =
Regex.Replace(input, #"(?<=\.)" + Regex.Escape(oldVar) + "(?=_[A-Z]+])", newVar);
Explanation:
The prefix part is \., which stand for ".".
The find part is the escaped old variable to be replaced. Regex escaping makes sure that characters with a special meaning in Regex are escaped.
The suffix part is _[A-Z]+], an underscore followed by at least one letter followed by "]". Note: the second ] needs not to be escaped. An opening bracket [ would have to be escaped like this: \[. We cannot use \w for word characters for the STATE-part as \w includes underscores. You might have to adapt the [A-Z] part to exactly match all possible states (e.g. if state has digits, use [A-Z0-9].

regex between characters including end

This regex below captures the -aaaa and -cccc but not the -eee
How can I do that?
keywords = "-aaa bbb -ccc -eee";
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-(.*?) ");
Use a "word boundary" /\b/ instead of a space, which matches the end of the string as well as a word/non-word boundary:
Regex.Matches(keywords, #"-(.*?)\b");
or, depending on what characters may be in the strings, just use "word characters" /\w/ to match the pattern:
Regex.Matches(keywords, #"-(\w+)");
MatchCollection worldColNegEnd = Regex.Matches(keywords, #"-(.*?)\b"
Word boundary is better than space, please give someone else upvotes though, since I brain farted the purpose of it.
Also I don't know why you included a ? in your original so I left it, but I believe it is not necessary, as * matches 0 or more matches.
Use
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-(.+?)\b");
Currently, your regex requires a trailing space behind the capturing group. the strings "aaa" and "ccc" have this, but "eee" does not.
Instead of matching any characters occurring after a dash, try matching nonspace characters:
#"-(\S*?)"
keywords = "-aaa bbb -ccc -eee";
MatchCollection wordColNegEnd = Regex.Matches(keywords, #"-\w+");
You haven't specified what exactly you are trying to match here.
But if I understood it right, you want to match any alpha string that starts with -
Use this RegEx: -[a-z]+

Regex for word.otherword

I want a Regular Expression for a word.otherword form. I tried \b[a-z]\.[a-z]\b, but it gives me an error at the \. part, saying Unrecognized escape sequence. Any idea what's wrong? I'm working under .NET C#. Thanks!
LE:
john.Smith or JoHn.SmItH or JOHN.SMITH should work.
John Smith or john!Smith or john.Smith.Smith shouldn't work.
Try this :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]\.[a-z]\b");
Probably you were not using #?
Your regex tries to match a.a this means a single character. But since you want it to match complete words you need a quantifier e.g.
\b[a-z]+\.[a-z]+\b
Finally you may want to use the case insensitive match to allow for words with capital letters to be matched too :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]+\.[a-z]+\b", RegexOptions.IgnoreCase);
This will match all words.words with at least one character for each word regardless of capitalization.
This will match all word.otherword only if there is a space behind the first word or it is the start of the string and only if there is a space after the second word or it is the end of the string.
foundMatch = Regex.IsMatch(SubjectString, #"(?<=\s|^)\b[a-z]+\.[a-z]+\b(?=\s|$)", RegexOptions.IgnoreCase);
Try this regex for word.word format:
#"\b([a-z]+)\.\1"
For word.otherword use this:
#"\b[a-z]+\.[a-z]+\b"

Regex with can be numerous ----- and must newline after the string

How would a regex look like when I search for this string:
before CAN be many comment lines --------"Encrypted" after must come a newline.
this does not seem to work:
Regex pattern = new Regex(#"^[-]*$[Encrypted][\n]");
what do I wrong?
The pattern you're searching for is not entirely clear to me, nor are the rest of the contents you're searching in, but if you're really just looking for "Encrypted" directly followed by only a newline then this is all you need to do:
Regex r = new Regex(#"Encrypted\n")
EDIT
Ok, comments seem to suggest that you're looking for zero or more occurences of "-", followed by "Encrypted", followed by newline. In that case the following will work.
Regex r = new Regex(#"-*Encrypted\n");
If there should be at least one "-" before "Encrypted", it will be
Regex r = new Regex(#"-+Encrypted\n");
Try it without the square brackets and move the dollar $ to the end of the pattern. Ie:
Regex pattern = new Regex(#"^-*Encrypted$");
The square brackets is like an Or statement. So [Encrypted] is the same as saying: 'E' or 'n' or 'c' or 'r'.... or 'd'.
The dollar symbol matches the end of the string.
I don't know specifically regex for c# but i thing you must not put the ^ because -'s are not at the begining of the line. And what the $ is doing in the middle.
So i would do in PCRE regex:
/-+"Encrypted" *\n/
That match one or more - followed by "Encryption" followed by 0 or more space followed by newline.
#Sanju
does this work :P
Regex pattern = new Regex(#"-*Encrypted\n");
for me it did! You have to remove the "^" char

Categories

Resources