match multiple words instead of a single word - c#

i have the following regex:
private string tokenRegEx = #"\[%RC:(\w+)%\].*?";
which is when i pass in the string below it finds it:
[%RC:TEST%]
However the following returns false
[%RC:TEST ITEM%]
how can i modify the regex to allow for spaces as well as whole words?

You need to change the \w pattern (which matches alphanum plus underscore only) to something more liberal. For example this would also allow whitespace:
private string tokenRegEx = #"\[%RC:((\w|\s)+)%\].*?";
Of course the "correct" solution would need to take into account exactly what you consider acceptable input, which is kind of open to discussion at this point.

Try this:
#"\[%RC:(\w|\s)+%\].*?";

This would do it, you have to match a space too. You use a group () but using a set [] is less expensive
private string tokenRegEx = #"\[%RC:([ \w]+)%\].*?";

Related

Patterns with special characters in Microsoft.VisualBasic.CompilerServices.LikeOperator.LikeString does not work

I have tried to use the LikeOperator.LikeString functionality for pattern matching as shown below:
// Usage: bool matchValue = LikeOperator.LikeString(string, pattern, CompareMethod);
bool match = LikeOperator.LikeString("*test*/fe_quet", "(*)test(*)/*", Microsoft.VisualBasic.CompareMethod.Text);
The above should return true as per the documentation, but it simply returns false. I tried to escape the (*) with the brackets, but it does not seem to work in that way. Could anyone please help me to define the pattern string with the special characters?
Thanks
From Like Operator (which you provided):
To match the special characters left bracket ([), question mark (?), number sign (#), and asterisk (*), enclose them in brackets.
Therefore you need to wrap your asterisks in [] instead of ():
bool match = LikeOperator.LikeString("*test*/fe_quet", "[*]test[*]/[*]", Microsoft.VisualBasic.CompareMethod.Text);
You'd probably be better off using Regex instead of the VB namespace.

Regular Expression Not working in .net

I'm using the following expression.
\W[A-C]{3}
The objective is to match 3 characters of anything between A and C that don't have any characters before them. So with input "ABC" it matches but "DABC" does not.
When i try this expression using various online regex tools (eg. http://gskinner.com/RegExr/), it works perfectly. When i try to use it in an asp.net RegularExpressionValidator or with the RegEx class, it never matches anything.
I've tried various different methods of not allowing a character before the match. eg.
[^\w] and [^a-zA-Z0-9]
all work in the online tools, but not in .net.
This test fails, but i'm not sure why?
[Test]
public void RegExWorks()
{
var regex = new Regex("\\W[A-C]{3}");
Match match = regex.Match("ABC");
Assert.IsTrue(match.Success);
}
How about something like this:
^[A-C]{3}
It is simple, but seems to fit what you are asking, and I tested it in rubular.com and .NET
Problem is that you require there to be a \W character. Use alteration to fix that, or a lookbehind to make sure there are no invalid characters.
Alteration:
(?:\W|^)[A-C]{3}
But I'd prefer a negative lookbehind:
(?<!\w)[A-C]{3}
\b (as in gymbralls answer) is short for (?<!\w)(?=\w)|(?<=\w)(?!\w), which in this case would just mean (?<!\w), thus being equivalent.
Also, in C# you can use the # quoting so you don't have to double escape things, eg:
var regex = new Regex(#"(?<!\w)[A-C]{3}");
You should consider trying:
[Test]
public void RegExWorks()
{
var regex = new Regex("\\b[A-C]{3}");
Match match = regex.Match("ABC");
Assert.IsTrue(match.Success);
}
The \\b matches a word boundary, which means it will match "ABC" as well as " ABC" and "$ABC". Using \\W requires there to be a non-word character, which doesn't sound like it is what you want.
Let me know if I'm missing something.
It is simple like "[A-C]{3}" this
OK so you can try following Expression
"[A-C][A-C]{2}"

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.
A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson
This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.
I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Any ideas why this does not work? C#

public class MyExample
{
public static void Main(String[] args)
{
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
}
I want "The Venture Bros" as output (in this example).
try this :
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "show_name=(.*?)&show_name_exact=true\">(.*?)</a");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[2].Value;
Console.WriteLine(key);
// alternate-1
}
I think it's because you're trying to do the perl-style slashes on the front and the end. A couple of other answerers have been confused by this already. The way he's written it, he's trying to do case-insensitive by starting and ending with / and putting an i on the end, the way you'd do it in perl.
But I'm pretty sure that .NET regexes don't work that way, and that's what's causing the problem.
Edit: to be more specific, look into RegexOptions, an example I pulled from MSDN is like this:
Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
The key there is the "RegexOptions.IgnoreCase", that'll cause the effect that you were trying for with /pattern/i.
The correct regex in your case would be
^.*&show_name_exact=true\"\>(.*)</a></p></li>$
regexp is tricky, but at http://www.regular-expressions.info/ you can find a great tutorial
/?show_name=(.)&show_name_exact=true\">(.)
would work as you expect I believe. But another thing I notice, is that you're trying to get the value of group[1], but I believe that you want the value of group[2], because there will be 3 groups, the first is the match, and the second is the first group...
Gl ;)
Because of the question mark before show_name. It is in input but not in pattern, thus no match.
Also, you try to match </i but the input doesn't contain this (it contains </li>).
First the regex starts "/show_name", but the target string has "/?show_name" so the first group won't want the first expected hit.
This will cause the whole regex to fail.
Ok, let's break this down.
Test Data: "The Venture Bros</p></li>"
Original Regex: "/show_name=(.*?)&show_name_exact=true\">(.*?)</i"
Working Regex: "/\?show_name=(.*)&show_name_exact=true\">(.*)</a"
We'll start at the left and work our way to the right, through the regex.
"?" became "\?" this is because a "?" means that the preceding character or group is optional. When we put a slash before it, it now matches a literal question mark.
"(.*?)" became "(.*)" the parentheses denote a group, and a question mark means "optional", but the "*" already means "0 or more" so this is really just removing a redundancy.
"</i" became "</a" this change was made to match your actual text which terminates the anchor with a "</a>" tag.
Suggested Regex: "[\\W]show_name=([^><\"]*)&show_name_exact=true\">([^<]*)<"
(The extra \'s were added to provide proper c# string escaping.)
A good tool for testing regular expressions in c#, is the regex-freetool at code.google.com

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?
This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.
I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.
Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)
Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Categories

Resources