I'm rather inexperienced in regex, but what I need to do is match a URL in order to route it correctly. Examples:
/2013/06/article-title
/2013/06
/2013
Are all possible paths I need to check for. I did some research and found a little bit about checking for an exact length. But when I tried to modify it for my own use, it returns false.
Here's what I had for the simplest:
^\\~/([0-9]{4})$
Any ideas? Thanks.
For reference, here's the code that tries to match it:
string url = HttpContext.Current.Request.AppRelativeCurrentExecutionFilePath;
Regex r = new Regex(regexp, RegexOptions.IgnoreCase);
m = r.Match(url);
return m.Success;
you can use this with groups (?:..) and question marks that make groups optional :
^/[0-9]{4}(?:/[0-9]{2}(?:/[\w-]+)?)?$
Related
I have a string and a regular expression that I am running against it. But instead of the first match, I am interested in the last match of the Regular Expression.
Is there a quick/easy way to do this?
I am currently using Regex.Matches, which returns a MatchCollection, but it doesn't accept any parameters that will help me, so I have to go through the collection and grab the last one. But it seems there should be an easier way to do this. Is there?
The .NET regex flavor allows you to search for matches from right to left instead of left to right. It's the only flavor I know of that offers such a feature. It's cleaner and more efficient than the traditional methods, such prefixing the regex with .*, or searching out all matches so you can take the last one.
To use it, pass this option when you call Match() (or other regex method):
RegexOptions.RightToLeft
More information can be found here.
Regex regex = new Regex("REGEX");
var v = regex.Match("YOUR TEXT");
string s = v.Groups[v.Count - 1].ToString();
You could use Linq LastOrDefault or Last to get the last match from MatchCollection.
var lastMatch = Regex.Matches(input,"hello")
.OfType<Match>()
.LastOrDefault();
Check this Demo
Or, as #jdweng mentioned in the comment, you could even access using index.
Match lastMatch = matches[matches.Count - 1];
I want to find exact url mach in url list using with Regular Expression .
string url = #"http://web/P02/Draw/V/Service.svc";
string myword = #"http://web/P02/Draw/V/Service.svc http://web/P02/Draw/V/Service.svc?wsdl";
string pattern = #"(^|\s)" + url + #"(\s|$)";
Match match = Regex.Match(pattern, myword);
if (match.Success)
{
myword = Regex.Replace(myword, pattern, "pattern");
}
But the pattern returns no result.
What do you think is the problem ?
Strange formatting aside, here is a pattern to match each individual URL in your list.
Pattern = "http://([a-zA-Z]|/|[0-9])*\.svc";
Frankly, I don't think you're having issues with syntax or implementation. If you want to tweak the expression I wrote above, this is the place to do it: Online RegEx Tool
You're passing wrong arguments to Regex.Match method. You need to swap arguments like this>
Match match = Regex.Match(myword,pattern);
Why not use Linq on the string collection (when splitted by a space)
myword.Split(' ').Where(x => x.Equals(url)).Single().Replace(url, "pattern");
You've got your arguments the wrong way around, as has been pointed out
. in a regular expression pattern is a special character, so you need to escape url when you use it to build pattern - you can use Regex.Escape(url)
You don't need to check the match is a success before performing the replacement, unless you have other logic that depends on whether the match was a success.
I was wondering if it is possible to build equivalent C# regular expression for finding this pattern in a filename. For example, this is the expr in perl /^filer_(\d{10}).txt(.gz)?$/i Could we find or extract the \d{10} part so I can use it in processing?
To create a Regex object that will ignore character casing and match your filter try the following:
Regex fileFilter = new Regex(#"^filter_(\d{10})\.txt(\.gz)?$", RegexOptions.IgnoreCase),
To perform the match:
Match match = fileFilter.Match(filename);
And to get the value (number here):
if(match.Success)
string id = match.Groups[1].Value;
The matched groups work similar to Perl's matches, [0] references the whole match, [1] the first sub pattern/match, etc.
Note: In your initial perl code you didn't escape the . characters so they'd match any character, not just real periods!
Yes, you can. See the Groups property of the Match class that is returned by a call to Regex.Match.
In your case, it would be something along the lines of the following:
Regex yourRegex = new Regex("^filer_(\d{10}).txt(.gz)?$");
Match match = yourRegex.Match(input);
if(match.Success)
result = match.Groups[1].Value;
I don't know, what the /i means at the end of your regex, so I removed it in my sample code.
As daniel shows, you can access the content of the matched input via groups. But instead of using default indexed groups you can also use named groups. In the following i show how and also that you can use the static version of Match.
Match m = Regex.Match(input, #"^(?i)filer_(?<fileID>\d{10}).txt(?:.gz)?$");
if(m.Success)
string s = m.Groups["fileID"].Value;
The /i in perl means IgnoreCase as also shown by Mario. This can also be set inline in the regex statement using (?i) as shown above.
The last part (?:.gz) creates a non-capturing group, which means that it’s used in the match but no group is created.
I'm not sure if that's what you want, this is how you can do that.
In C#, how would I capture the integer value in the URL like:
/someBlah/a/3434/b/232/999.aspx
I need to get the 999 value from the above url.
The url HAS to have the /someBlah/ in it.
All other values like a/3434/b/232/ can be any character/number.
Do I have escape for the '/' ?
Try the following
var match = Regex.Match(url,"^http://.*someblah.*\/(\w+).aspx$");
if ( match.Success ) {
string name = match.Groups[1].Value;
}
You didn't specify what names could appear in front of the ASPX file. I took the simple approach of using the \w regex character which matches letters and digits. You can modify it as necessary to include other items.
You are effectively getting the file name without an extension.
Although you specifically asked for a regular expression, unless you are in a scenario where you really need to use one, I'd recommend that you use System.IO.Path.GetFileNameWithoutExtension:
Path.GetFileNameWithoutExtension(Context.Request.FilePath)
^(?:.+/)*(?:.+)?/someBlah/(?:.+/)*(.+)\.aspx$
This is a bit exhaustive, but it can handle scenarios where /someBlah/ does not have to be at the beginning of the string.
The ?: operator implies a non-capturing group, which may or may not be supported by your RegEx flavor.
Regex regex = new Regex("^http://.*someBlah.*/(\\d+).aspx$");
Match match = regex.Match(url);
int result;
if (match.Success)
{
int.TryParse(match.Groups[1].Value, out result);
}
Using \d rather than \w ensures that you only match digits, and unless the ignore case flag is set the capitalisation of someBlah must be correct.
public class MyExample
{
public static void Main(String[] args)
{
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
}
I want "The Venture Bros" as output (in this example).
try this :
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "show_name=(.*?)&show_name_exact=true\">(.*?)</a");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[2].Value;
Console.WriteLine(key);
// alternate-1
}
I think it's because you're trying to do the perl-style slashes on the front and the end. A couple of other answerers have been confused by this already. The way he's written it, he's trying to do case-insensitive by starting and ending with / and putting an i on the end, the way you'd do it in perl.
But I'm pretty sure that .NET regexes don't work that way, and that's what's causing the problem.
Edit: to be more specific, look into RegexOptions, an example I pulled from MSDN is like this:
Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
The key there is the "RegexOptions.IgnoreCase", that'll cause the effect that you were trying for with /pattern/i.
The correct regex in your case would be
^.*&show_name_exact=true\"\>(.*)</a></p></li>$
regexp is tricky, but at http://www.regular-expressions.info/ you can find a great tutorial
/?show_name=(.)&show_name_exact=true\">(.)
would work as you expect I believe. But another thing I notice, is that you're trying to get the value of group[1], but I believe that you want the value of group[2], because there will be 3 groups, the first is the match, and the second is the first group...
Gl ;)
Because of the question mark before show_name. It is in input but not in pattern, thus no match.
Also, you try to match </i but the input doesn't contain this (it contains </li>).
First the regex starts "/show_name", but the target string has "/?show_name" so the first group won't want the first expected hit.
This will cause the whole regex to fail.
Ok, let's break this down.
Test Data: "The Venture Bros</p></li>"
Original Regex: "/show_name=(.*?)&show_name_exact=true\">(.*?)</i"
Working Regex: "/\?show_name=(.*)&show_name_exact=true\">(.*)</a"
We'll start at the left and work our way to the right, through the regex.
"?" became "\?" this is because a "?" means that the preceding character or group is optional. When we put a slash before it, it now matches a literal question mark.
"(.*?)" became "(.*)" the parentheses denote a group, and a question mark means "optional", but the "*" already means "0 or more" so this is really just removing a redundancy.
"</i" became "</a" this change was made to match your actual text which terminates the anchor with a "</a>" tag.
Suggested Regex: "[\\W]show_name=([^><\"]*)&show_name_exact=true\">([^<]*)<"
(The extra \'s were added to provide proper c# string escaping.)
A good tool for testing regular expressions in c#, is the regex-freetool at code.google.com