Need help with a Regular Expression, Pattern Matching in C#? - c#

I need some help with a simple pattern matching and replacement exercise I am doing?
I need to match both of the following two strings in any string in a given context and it is expected that both patterns are to exist in a given supplied string.
1) "width=000" or "width=00" or "width=0"
2) "drop=000" or "drop=00" or "drop=0"
The values can be any values between 0-9 for each case so '000' --> '999' could a valid test case in a supplied test.
string url = Regex.Replace(inputString, patternString, replacementValueString);
Thanks,

Have a look at this page to explain the individual elements: http://msdn.microsoft.com/en-us/library/az24scfc.aspx
A regex string like this should work great:
"\b(?:width|drop)\s*=\s*\d{1,3}\b"
To read the name and value in your code:
"\b(?<name>width|drop)\s*=\s*(?<value>\d{1,3})\b"
If you do not need to limit the numbers to only 3 digits, you could use the "\d+" instead of "\d{1,3}".
The "\b" at the beginning will make sure that you don't get a "width" or "drop" that is part of some larger word. The "\b" at the end will prevent you from matching numbers larger than 999.
The "\s*" on either side of the equals statement allow for "drop = 000" as well as "drop=000".

Something like this would work :
(?:width|drop)=\d{1,3}

Related

What is the correct RegEx to extract my substring

I have an input string like this
$(xx.xx.xx)abcde$(yyy.yyy.yyy)fghijk$(zzz.zz.zz.zzz)
I want to be able to pull out each subset of strings matching $(anything inside here), so for the example above I would like to get 3 substrings.
the characters in between the brackets do not necessarily always match the same pattern.
I have tried using the following regex
(\$\([a-z]+.*\))
but this matches whole string, due to the fact it starts with '$', anything in middle, and ends with ')'
Hopefully this makes sense.
I should also note that I have very limited experience using regex.
Thanks
(\$\([a-z]+.*?\))
Use ? to make your search non greedy.* is greedy and consumes the max it can.adding ? to * makes it non greedy and it will stop at the first instance of ).
See demo.
http://regex101.com/r/sU3fA2/28
try the below
\((.*?)\)\g
for the given string $(xx.xx.xx)abcde$(yyy.yyy.yyy)fghijk$(zzz.zz.zz.zzz) it returns the three substring..
MATCH 1
1. [2-10] `xx.xx.xx`
MATCH 2
1. [18-29] `yyy.yyy.yyy`
MATCH 3
1. [38-51] `zzz.zz.zz.zzz`
http://regex101.com/r/bX7qR2/1

What Regex would I use to match file names with the format 'number-text-text'?

I have code that searches a folder that contains SQL patch files. I want to add file names to an array if they match the following name format:
number-text-text.sql
What Regex would I use to match number-text-text.sql?
Would it be possible to make the Regex match file names where the number part of the file name is between two numbers? If so what would be the Regex syntax for this?
The following regex make it halfway there:
\d+-[a-zA-Z]+-[a-zA-Z]+\.sql
Regarding to match in a specific range it gets trickier as regex doesn't have a simple way to handle ranges. To limit the match to a filename with a number between 2 and 13 this is the regex:
([2-9]|1[0-3])-[a-zA-Z]+-[a-zA-Z]+\.sql
Your regular expression should be:
(\d+)-[a-zA-Z]+-[a-zA-Z]+\.sql
You would then use the first captured group to check if your number is between the two numbers you desire. Don't try to check if a number is within a range with a regular expression; do it in two steps. Your code will be much clearer for it.
How about:
\d+-[^-]+-[^-]+\.sql
Edit: You want just letters, so here it is without specific ranges.
\d+-[a-z]+-[a-z]+\.sql - You'll also want to use the i flag, not sure how that's done in c#, but here it is in js/perl:
/\d+-[a-z]+-[a-z]+\.sql/i
Ranges are more difficult. Here's an example of how to match 0-255:
([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
So to match (0-255)-text-text.sql, you'd have this:
/^(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])-[a-z]+-[a-z]+\.sql/i
(I put the digits in a non-capturing group and matched from the beginning of the string to prevent partial matches on the number and in case you're expecting numbered groups or something).
Basically every time you need another digit of possibility, you'll need to add a new condition inside this case. The smaller the digit you'd like to match, the more cases you'll need as well. What is your desired min/max? AFAIK there's not a simple way to do this dynamically (although I'd love for someone to show me I'm wrong about that).
The simplest way to get around this would be to simply capture the digits, and use native syntax to see if it's in your range. Example in js:
var match = filename.match(/(\d+)-[a-z]+-[a-z]+\.sql/i);
if(match && match[1] < maximumNumber && match[1] > minimumNumber){
doStuff();
}
This should work:
select '4-dfsg-asdfg.sql' ~ E'^[0-9]+-[a-zA-Z]+-[a-zA-Z]+\\.sql$'
This restricts the TEXT to simple ASCII characters. May or may not be what you want.
This is tested in PostgreSQL. Regular expression flavors differ a lot between implementations. You probably know that?
Anchors at begin ^ and end $ are optional, depending how you are going to do it.

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.
A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson
This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.
I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Regex.Matches returns one match per line, not per "word"

I'm having a hard time understanding why the following expression \\[B.+\\] and code returns a Matches count of 1:
string r = "\\[B.+\\]";
return Regex.Matches(Markup, sRegEx);
I want to find all the instances (let's call them 'tags') (in a variable length HTML string Markup that contains no line breaks) that are prefixed by B and are enclosed in square brackets.
If the markup contains [BName], I get one match - good.
If the markup contains [BName] [BAddress], I get one match - why?
If the markup contains [BName][BAddress], I also only get one match.
On some web-based regex testers, I've noticed that if the text contains a CR character, I'll get a match per line - but I need some way to specify that I want matches returned independent of line breaks.
I've also poked around in the Groups and Captures collections of the MatchCollection, but to no avail - always just one result.
You are getting only one match because, by default, .NET regular expressions are "greedy"; they try to match as much as possible with a single match.
So if your value is [BName][BAddress] you will have one match - which will match the entire string; so it will match from the [B at the beginning all the way to the last ] - instead of the first one. If you want two matches, use this pattern instead: \\[B.+?\\]
The ? after the + tells the matching engine to match as little as possible... leaving the second group to be its own match.
Slaks also noted an excellent option; specifying specifically that you do not wish to match the ending ] as part of the content, like so: \\[B[^\\]]+\\] That keeps your match 'greedy', which might be useful in some other case. In this specific instance, there may not be much difference - but it's an important thing to keep in mind depending on what data/patterns you might be dealing with specifically.
On a side note, I recommend using the C# "literal string" specifier # for regular expression patterns, so that you do not need to double-escape things in regex patterns; So I would set the pattern like so:
string pattern = #"\[B.+?\]";
This makes it much easier to figure out regular expressions that are more complex
Try the regex string \\[B.+?\\] instead. .+ on it's own (same is pretty much true for .*) will match against as many characters as possible, whereas .+? (or .*?) will match against the bare minimum number of characters whilst still satisfying the rest of the expression.
.+ is a greedy match; it will match as much as possible.
In your second example, it matches BName] [BAddress.
You should write \[B[^\]]+\].
[^\]] matches every character except ], so it is forced to stop before the first ].

Regex which ensures no character is repeated

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?
You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.
This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.
Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)
Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.
This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo
It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Categories

Resources