C# regex optional group not working

C# regex optional group not working - c#

I am trying to get parts from these strings:
first:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303034342D30313230382D
second:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303130312D3032323534012630303130312D31303932342D
basically I want to return for both strings:
first:
2F(.+)011F(.+)2D
second:
2F(.+)011F(.+)0126(.+)2D
I am trying to use this pattern:
Match m = Regex.Match(this.__line,
#"^2F.*22(.*)011F(.*)(0126.*)?.{2}$",
RegexOptions.IgnoreCase);
However, when I try:
if (m.Success)
{
if (m.Groups[3].Value != "")
{
Console.WriteLine("good");
}
}
else
{
Console.WriteLine("bad");
}
I get "bad" from the second string because it is not matching the pattern. Am I not using the correct pattern?

The problem is that your pattern is greedy. You should use this patten instead:
^2F.*22(.*?)011F(.*?)(0126.*?)?.{2}$
The second group in your regex matches everything until the last 2 charcaters at the end because it is greedy and the last group is optional.
To make your matches nongreedy use a ? after the quantifier.
Here is more info about greedy and nongreedy.
Hope this helps.

Take out the "^".
2F.22(.)011F(.)(0126.)?.{2}$
http://regexpal.com/ is my hands down favorite regex tool.

I would like to give you some advices. These are not answers to your question, just some good practices tips:
The anything but new line (.) symbol has a very poor performance, you should avoid using it whenever possible. As I can see, you could replace it with \S
For case insensitive match, use the syntax (?i:pattern). This gives you the option of choosing RegexOptions.Compiled, which will give you a better performance
For retrieving text, the use of named capture groups is recommended. Use the syntax (?<name>pattern). This way you can retrieve it by regexMatch.Groups["name"].Captures[0].Value
Whenever you have a group that you do not want to retrive (only for matching purposes), mark it as a non-capturing group, using the syntax (?:pattern)
Lastly, RegexBuddy is a great (yet paid) tool. Highly recommended.
Regards.

Related

Find the Last Match in a Regular Expression

I have a string and a regular expression that I am running against it. But instead of the first match, I am interested in the last match of the Regular Expression.
Is there a quick/easy way to do this?
I am currently using Regex.Matches, which returns a MatchCollection, but it doesn't accept any parameters that will help me, so I have to go through the collection and grab the last one. But it seems there should be an easier way to do this. Is there?

The .NET regex flavor allows you to search for matches from right to left instead of left to right. It's the only flavor I know of that offers such a feature. It's cleaner and more efficient than the traditional methods, such prefixing the regex with .*, or searching out all matches so you can take the last one.
To use it, pass this option when you call Match() (or other regex method):
RegexOptions.RightToLeft
More information can be found here.

Regex regex = new Regex("REGEX");
var v = regex.Match("YOUR TEXT");
string s = v.Groups[v.Count - 1].ToString();

You could use Linq LastOrDefault or Last to get the last match from MatchCollection.
var lastMatch = Regex.Matches(input,"hello")
.OfType<Match>()
.LastOrDefault();
Check this Demo
Or, as #jdweng mentioned in the comment, you could even access using index.
Match lastMatch = matches[matches.Count - 1];

Regular Expression Pattern Matching

Hi I need to do like this.
Actually **ctu** is a good university but **ctu's** is not. There are many **,ctus,** present.
What I want to do is, I want to replace ctu in the string like this.
Actually **<s>ctu<e>** is a good university but **<s>ctu's<e>** is not. There are many **,<s>ctus<e>,** present.
But with the following pattern
**\\bctu*(?:['\\\\|""\\\\]*)\\w+\\b**
I'm getting the out put as:
A**<s>ctu<e>**ally **<s>ctu<e>** is a good university but **<s>ctu's<e>** is not. There are many **,ctus,** present.
I dont want to replace ctu inside words Actually. and also I need to replace " ,ctus, " with " ,<s>ctus<e>, "
How do I achieve this using regex. I need this in c#. csharp.
Thanks in advance.

The following regex matches all the cases listed in your example:
#"(\bctu(?:'\w+)?\w*\b)"
Then just replace the match with #"<s>\1<e>" where \1 is the backreference to the match above.

Are you looking for #"\bctu\b" ("ctu" with word boundaries on both sides, so it matches ctu but not Actually, ctu's, or ,ctus,) for the first search pattern and ",ctus," (exactly the string ,ctus,, regardless of where it might fall in a word) as the second search pattern? To search for both of these at once, you could use #"(\bctu\b|,ctus,)".
As a slight aside, in C# you can write regex literals much easier by using the #"" notation (verbatim strings) instead of "". E.g. to get regex to understand a word boundary, it must see \b, which can be represented as #"\b" or "\\b", and a literal \ is "\\\\" or #"\\". The first is easier to read, especially in more complex cases.
If this doesn't answer your question, please give a clear example of expected input/output.

Regular Expression Not working in .net

I'm using the following expression.
\W[A-C]{3}
The objective is to match 3 characters of anything between A and C that don't have any characters before them. So with input "ABC" it matches but "DABC" does not.
When i try this expression using various online regex tools (eg. http://gskinner.com/RegExr/), it works perfectly. When i try to use it in an asp.net RegularExpressionValidator or with the RegEx class, it never matches anything.
I've tried various different methods of not allowing a character before the match. eg.
[^\w] and [^a-zA-Z0-9]
all work in the online tools, but not in .net.
This test fails, but i'm not sure why?
[Test]
public void RegExWorks()
{
var regex = new Regex("\\W[A-C]{3}");
Match match = regex.Match("ABC");
Assert.IsTrue(match.Success);
}

How about something like this:
^[A-C]{3}
It is simple, but seems to fit what you are asking, and I tested it in rubular.com and .NET

Problem is that you require there to be a \W character. Use alteration to fix that, or a lookbehind to make sure there are no invalid characters.
Alteration:
(?:\W|^)[A-C]{3}
But I'd prefer a negative lookbehind:
(?<!\w)[A-C]{3}
\b (as in gymbralls answer) is short for (?<!\w)(?=\w)|(?<=\w)(?!\w), which in this case would just mean (?<!\w), thus being equivalent.
Also, in C# you can use the # quoting so you don't have to double escape things, eg:
var regex = new Regex(#"(?<!\w)[A-C]{3}");

You should consider trying:
[Test]
public void RegExWorks()
{
var regex = new Regex("\\b[A-C]{3}");
Match match = regex.Match("ABC");
Assert.IsTrue(match.Success);
}
The \\b matches a word boundary, which means it will match "ABC" as well as " ABC" and "$ABC". Using \\W requires there to be a non-word character, which doesn't sound like it is what you want.
Let me know if I'm missing something.

It is simple like "[A-C]{3}" this

OK so you can try following Expression
"[A-C][A-C]{2}"

Need some C# Regular Expression Help

I'm trying to come up with a regular expression that will stop at the first occurence of </ol>. My current RegEx sort of works, but only if </ol> has spaces on either end. For instance, instead of stopping at the first instance in the line below, it'd stop at the second
some random text and HTML</ol></b> bla </ol>
Here's the pattern I'm currently using: string pattern = #"some random text(.|\r|\n)*</ol>";
What am I doing wrong?

string pattern = #"some random text(.|\r|\n)*?</ol>";
Note the question mark after the star -- that tells it to be non greedy, which basically means that it will capture as little as possible, rather than the greedy as much as possible.

Make your wild-card "ungreedy" by adding a ?. e.g.
some random text(.|\r|\n)*?</ol>
^- Addition
This will make regex match as few characters as possible, instead of matching as many (standard behavior).
Oh, and regex shouldn't parse [X]HTML

While not a Regex, why not simply use the Substring functions, like:
string returnString = someRandomText.Substring(0, someRandomText.IndexOf("</ol>") - 1);
That would seem to be a lot easier than coming up with a Regex to cover all the possible varieties of characters, spaces, etc.

This regex matches everything from the beginning of the string up to the first </ol>. It uses Friedl's "unrolling-the-loop" technique, so is quite efficient:
Regex pattern = new Regex(
#"^[^<]*(?:(?!</ol\b)<[^<]*)*(?=</ol\b)",
RegexOptions.IgnoreCase);
resultString = pattern.Match(text).Value;

Others had already explained the missing ? to make the quantifier non greedy. I want to suggest also another change.
I don't like your (.|\r|\n) part. If you have only single characters in your alternation, its simpler to make a character class [.\r\n]. This is doing the same thing and its better to read (I don't know compiler wise, maybe its also more efficient).
BUT in your special case when the alternatives to the . are only newline characters, this is also not the correct way. Here you should do this:
Regex A = new Regex(#"some random text.*?</ol>", RegexOptions.Singleline);
Use the Singleline modifier. It just makes the . match also newline characters.

Any ideas why this does not work? C#

public class MyExample
{
public static void Main(String[] args)
{
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
}
I want "The Venture Bros" as output (in this example).

try this :
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "show_name=(.*?)&show_name_exact=true\">(.*?)</a");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[2].Value;
Console.WriteLine(key);
// alternate-1
}

I think it's because you're trying to do the perl-style slashes on the front and the end. A couple of other answerers have been confused by this already. The way he's written it, he's trying to do case-insensitive by starting and ending with / and putting an i on the end, the way you'd do it in perl.
But I'm pretty sure that .NET regexes don't work that way, and that's what's causing the problem.
Edit: to be more specific, look into RegexOptions, an example I pulled from MSDN is like this:
Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
The key there is the "RegexOptions.IgnoreCase", that'll cause the effect that you were trying for with /pattern/i.

The correct regex in your case would be
^.*&show_name_exact=true\"\>(.*)</a></p></li>$
regexp is tricky, but at http://www.regular-expressions.info/ you can find a great tutorial

/?show_name=(.)&show_name_exact=true\">(.)
would work as you expect I believe. But another thing I notice, is that you're trying to get the value of group[1], but I believe that you want the value of group[2], because there will be 3 groups, the first is the match, and the second is the first group...
Gl ;)

Because of the question mark before show_name. It is in input but not in pattern, thus no match.
Also, you try to match </i but the input doesn't contain this (it contains </li>).

First the regex starts "/show_name", but the target string has "/?show_name" so the first group won't want the first expected hit.
This will cause the whole regex to fail.

Ok, let's break this down.
Test Data: "The Venture Bros</p></li>"
Original Regex: "/show_name=(.*?)&show_name_exact=true\">(.*?)</i"
Working Regex: "/\?show_name=(.*)&show_name_exact=true\">(.*)</a"
We'll start at the left and work our way to the right, through the regex.
"?" became "\?" this is because a "?" means that the preceding character or group is optional. When we put a slash before it, it now matches a literal question mark.
"(.*?)" became "(.*)" the parentheses denote a group, and a question mark means "optional", but the "*" already means "0 or more" so this is really just removing a redundancy.
"</i" became "</a" this change was made to match your actual text which terminates the anchor with a "</a>" tag.
Suggested Regex: "[\\W]show_name=([^><\"]*)&show_name_exact=true\">([^<]*)<"
(The extra \'s were added to provide proper c# string escaping.)
A good tool for testing regular expressions in c#, is the regex-freetool at code.google.com

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# regex optional group not working - c#

Take out the "^". 2F.22(.)011F(.)(0126.)?.{2}$ http://regexpal.com/ is my hands down favorite regex tool.

Related

Find the Last Match in a Regular Expression

Regular Expression Pattern Matching

Regular Expression Not working in .net

Need some C# Regular Expression Help

Any ideas why this does not work? C#

Categories

Resources