Find the Last Match in a Regular Expression - c#

I have a string and a regular expression that I am running against it. But instead of the first match, I am interested in the last match of the Regular Expression.
Is there a quick/easy way to do this?
I am currently using Regex.Matches, which returns a MatchCollection, but it doesn't accept any parameters that will help me, so I have to go through the collection and grab the last one. But it seems there should be an easier way to do this. Is there?

The .NET regex flavor allows you to search for matches from right to left instead of left to right. It's the only flavor I know of that offers such a feature. It's cleaner and more efficient than the traditional methods, such prefixing the regex with .*, or searching out all matches so you can take the last one.
To use it, pass this option when you call Match() (or other regex method):
RegexOptions.RightToLeft
More information can be found here.

Regex regex = new Regex("REGEX");
var v = regex.Match("YOUR TEXT");
string s = v.Groups[v.Count - 1].ToString();

You could use Linq LastOrDefault or Last to get the last match from MatchCollection.
var lastMatch = Regex.Matches(input,"hello")
.OfType<Match>()
.LastOrDefault();
Check this Demo
Or, as #jdweng mentioned in the comment, you could even access using index.
Match lastMatch = matches[matches.Count - 1];

Related

Regex.IsMatch does not return expected result

var managementCount = from tbdocheader in context.tblDocumentHeaders
join tbDocRevision in context.tblDocumentRevisions
on tbdocheader.DocumentHeaderID equals tbDocRevision.DocumentHeaderID
select new { tbdocheader, tbDocRevision };
var query =(from obj in managementCount.AsEnumerable()
where Regex.IsMatch(obj.tbDocRevision.Revision, #"[A-Za-z]%")
select obj).Count();
I'm trying to get the records count where Revision starts with an alphabet."managementCount" query returns records with "Revision=A", but my query does not returns any matching records.
is something wrong with my regular expression?
Try the pattern "^[A-Za-z]*$
Here
^ indicates start of an expression,
$ indicates end of an expression,
[A-Za-z] will allow any alphabet character and
[A-Za-z]* will allow any length of alphabet characters.
In C# code you will write :
#"^[A-Za-z]*$
Here, the # symbol means to read that string literally, and don't interpret control characters otherwise
I hope this will help you..!
Try the pattern "^[A-Za-z]"...
var query =(from obj in managementCount.AsEnumerable()
where Regex.IsMatch(obj.tbDocRevision.Revision, #"^[A-Za-z]")
select obj).Count();
I think you're looking for the pattern "^[a-z]" with extra parameter RegexOptions.IgnoreCase.
It looks to me like you're used to SQL LIKE syntax. Regular expressions are different--they use different wildcard characters, have many more matching abilities, by default match multiple times in a string, and are also a lot harder to get right. SQL LIKE patterns are always implicitly anchored at the ends, and Regexes are not.
So the pattern above means, match starting at the beginning of the string ^, and then be followed by a letter. There is no need to add a wildcard character because Regexes are not anchored by default.
I encourage you to go do some reading and study. Try regular-expressions.info.

Find exact url match

I want to find exact url mach in url list using with Regular Expression .
string url = #"http://web/P02/Draw/V/Service.svc";
string myword = #"http://web/P02/Draw/V/Service.svc http://web/P02/Draw/V/Service.svc?wsdl";
string pattern = #"(^|\s)" + url + #"(\s|$)";
Match match = Regex.Match(pattern, myword);
if (match.Success)
{
myword = Regex.Replace(myword, pattern, "pattern");
}
But the pattern returns no result.
What do you think is the problem ?
Strange formatting aside, here is a pattern to match each individual URL in your list.
Pattern = "http://([a-zA-Z]|/|[0-9])*\.svc";
Frankly, I don't think you're having issues with syntax or implementation. If you want to tweak the expression I wrote above, this is the place to do it: Online RegEx Tool
You're passing wrong arguments to Regex.Match method. You need to swap arguments like this>
Match match = Regex.Match(myword,pattern);
Why not use Linq on the string collection (when splitted by a space)
myword.Split(' ').Where(x => x.Equals(url)).Single().Replace(url, "pattern");
You've got your arguments the wrong way around, as has been pointed out
. in a regular expression pattern is a special character, so you need to escape url when you use it to build pattern - you can use Regex.Escape(url)
You don't need to check the match is a success before performing the replacement, unless you have other logic that depends on whether the match was a success.

C# regex optional group not working

I am trying to get parts from these strings:
first:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303034342D30313230382D
second:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303130312D3032323534012630303130312D31303932342D
basically I want to return for both strings:
first:
2F(.+)011F(.+)2D
second:
2F(.+)011F(.+)0126(.+)2D
I am trying to use this pattern:
Match m = Regex.Match(this.__line,
#"^2F.*22(.*)011F(.*)(0126.*)?.{2}$",
RegexOptions.IgnoreCase);
However, when I try:
if (m.Success)
{
if (m.Groups[3].Value != "")
{
Console.WriteLine("good");
}
}
else
{
Console.WriteLine("bad");
}
I get "bad" from the second string because it is not matching the pattern. Am I not using the correct pattern?
The problem is that your pattern is greedy. You should use this patten instead:
^2F.*22(.*?)011F(.*?)(0126.*?)?.{2}$
The second group in your regex matches everything until the last 2 charcaters at the end because it is greedy and the last group is optional.
To make your matches nongreedy use a ? after the quantifier.
Here is more info about greedy and nongreedy.
Hope this helps.
Take out the "^".
2F.22(.)011F(.)(0126.)?.{2}$
http://regexpal.com/ is my hands down favorite regex tool.
I would like to give you some advices. These are not answers to your question, just some good practices tips:
The anything but new line (.) symbol has a very poor performance, you should avoid using it whenever possible. As I can see, you could replace it with \S
For case insensitive match, use the syntax (?i:pattern). This gives you the option of choosing RegexOptions.Compiled, which will give you a better performance
For retrieving text, the use of named capture groups is recommended. Use the syntax (?<name>pattern). This way you can retrieve it by regexMatch.Groups["name"].Captures[0].Value
Whenever you have a group that you do not want to retrive (only for matching purposes), mark it as a non-capturing group, using the syntax (?:pattern)
Lastly, RegexBuddy is a great (yet paid) tool. Highly recommended.
Regards.

Need some C# Regular Expression Help

I'm trying to come up with a regular expression that will stop at the first occurence of </ol>. My current RegEx sort of works, but only if </ol> has spaces on either end. For instance, instead of stopping at the first instance in the line below, it'd stop at the second
some random text and HTML</ol></b> bla </ol>
Here's the pattern I'm currently using: string pattern = #"some random text(.|\r|\n)*</ol>";
What am I doing wrong?
string pattern = #"some random text(.|\r|\n)*?</ol>";
Note the question mark after the star -- that tells it to be non greedy, which basically means that it will capture as little as possible, rather than the greedy as much as possible.
Make your wild-card "ungreedy" by adding a ?. e.g.
some random text(.|\r|\n)*?</ol>
^- Addition
This will make regex match as few characters as possible, instead of matching as many (standard behavior).
Oh, and regex shouldn't parse [X]HTML
While not a Regex, why not simply use the Substring functions, like:
string returnString = someRandomText.Substring(0, someRandomText.IndexOf("</ol>") - 1);
That would seem to be a lot easier than coming up with a Regex to cover all the possible varieties of characters, spaces, etc.
This regex matches everything from the beginning of the string up to the first </ol>. It uses Friedl's "unrolling-the-loop" technique, so is quite efficient:
Regex pattern = new Regex(
#"^[^<]*(?:(?!</ol\b)<[^<]*)*(?=</ol\b)",
RegexOptions.IgnoreCase);
resultString = pattern.Match(text).Value;
Others had already explained the missing ? to make the quantifier non greedy. I want to suggest also another change.
I don't like your (.|\r|\n) part. If you have only single characters in your alternation, its simpler to make a character class [.\r\n]. This is doing the same thing and its better to read (I don't know compiler wise, maybe its also more efficient).
BUT in your special case when the alternatives to the . are only newline characters, this is also not the correct way. Here you should do this:
Regex A = new Regex(#"some random text.*?</ol>", RegexOptions.Singleline);
Use the Singleline modifier. It just makes the . match also newline characters.

Simple C# regex question

Question: What's the simplest way how to test if given Regex matches whole string ?
An example:
E.g. given Regex re = new Regex("."); I want to test if given input string has only one character using this Regex re. How do I do that ?
In other words: I'm looking for method of class Regex that works similar to method matches() in class Matcher in Java ("Attempts to match the entire region against the pattern.").
Edit: This question is not about getting length of some string. The question is how to match whole strings with regular exprestions. The example used here is only for demonstration purposes (normally everybody would check the Length property to recognise one character strings).
If you are allowed to change the regular expression you should surround it by ^( ... )$. You can do this at runtime as follows:
string newRe = new Regex("^(" + re.ToString() + ")$");
The parentheses here are necessary to prevent creating a regular expression like ^a|ab$ which will not do what you want. This regular expression matches any string starting with a or any string ending in ab.
If you don't want to change the regular expression you can check Match.Value.Length == input.Length. This is the method that is used in ASP.NET regular expression validators. See my answer here for a fuller explanation.
Note this method can cause some curious issues that you should be aware of. The regular expression "a|ab" will match the string 'ab' but the value of the match will only be "a". So even though this regular expression could have matched the whole string, it did not. There is a warning about this in the documentation.
use an anchored pattern
Regex re = new Regex("^.$");
for testing string length i'd check the .Length property though (str.Length == 1) …
"b".Length == 1
is a much better candidate than
Regex.IsMatch("b", "^.$")
You add "start-of-string" and "end-of-string" anchors
^.$

Categories

Resources