Regex which ensures no character is repeated - c#

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?

You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.

This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.

Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)

Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.

This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo

It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Related

Regex to extract string between parentheses which also contains other parentheses

I've been trying to figure this out, but I don't think I understand Regex well enough to get to where I need to.
I have string that resemble these:
filename.txt(1)attribute, 2)attribute(s), more!)
otherfile.txt(abc, def)
Basically, a string that always starts with a filename, then has some text between parentheses. And I'm trying to extract that part which is between the main parentheses, but the text that's there can contain absolutely anything, even some more parentheses (it often does.)
Originally, there was a 'hacky' expression made like this:
/\(([^#]+)\)\g
And it worked, until we ran into a case where the input string contained a # and we were stuck. Obviously...
I can't change the way the strings are generated, it's always a filename, then some parentheses and something of unknown length and content inside.
I'm hoping for a simple Regex expression, since I need this to work in both C# and in Perl -- is such a thing possible? Or does this require something more complex, like its own parsing method?
You can change exception for # symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:
\(.*\)
Here is the explanation for the regular expression:
Symbol \( matches the character ( literally.
.* matches any character (except for line terminators)
* quantifier matches between zero and unlimited times, as many times
as possible, giving back as needed (greedy)
\) matches the character ) literally.
You can use regex101 to compose and debug your regular expressions.
Regex seems overkill to me in this case. Can be more reliably achieved using string manipulation methods.
int first = str.IndexOf("(");
int last = str.LastIndexOf(")");
if (first != -1 && last != -1)
{
string subString = str.Substring(first + 1, last - first - 1);
}
I've never used Perl, but I'll venture a guess that it has equivalent methods.

Regex to ensure that in a string such as "05123:12315", the first number is less than the second?

I must have strings in the format x:y where x and y have to be five digits (zero padded) and x <= y.
Example:
00515:02152
What Regex will match this format?
If possible, please explain the solution briefly to help me learn.
EDIT: Why do I need Regex? I've written a generic tool that takes input and validates it according to a configuration file. An unexpected requirement popped up that would require me to validate a string in the format I've shown (using the configuration file). I was hoping to solve this problem using the existing configuration framework I've coded up, as splitting and parsing would be out of the scope of this tool. For an outstanding requirement such as this, I don't mind having some unorthodox/messy regex, as long as it's not 10000 lines long. Any intelligent solutions using Regex are appreciated! Thanks.
Description
This expression will validate that the first 5 digit number is smaller then the second 5 digit number where zero padded 5 digit numbers are in a : delimited string and is formatted as 01234:23456.
^
(?:
(?=0....:[1-9]|1....:[2-9]|2....:[3-9]|3....:[4-9]|4....:[5-9]|5....:[6-9]|6....:[7-9]|7....:[8-9]|8....:[9])
|(?=(.)(?:0...:\1[1-9]|1...:\1[2-9]|2...:\1[3-9]|3...:\1[4-9]|4...:\1[5-9]|5...:\1[6-9]|6...:\1[7-9]|7...:\1[8-9]|8...:\1[9]))
|(?=(..)(?:0..:\2[1-9]|1..:\2[2-9]|2..:\2[3-9]|3..:\2[4-9]|4..:\2[5-9]|5..:\2[6-9]|6..:\2[7-9]|7..:\2[8-9]|8..:\2[9]))
|(?=(...)(?:0.:\3[1-9]|1.:\3[2-9]|2.:\3[3-9]|3.:\3[4-9]|4.:\3[5-9]|5.:\3[6-9]|6.:\3[7-9]|7.:\3[8-9]|8.:\3[9]))
|(?=(....)(?:0:\4[1-9]|1:\4[2-9]|2:\4[3-9]|3:\4[4-9]|4:\4[5-9]|5:\4[6-9]|6:\4[7-9]|7:\4[8-9]|8:\4[9]))
)
\d{5}:\d{5}$
Live demo: http://www.rubular.com/r/w1QLZhNoEa
Note that this is using the x option to ignore all white space and allow comments, if you use this without x then the expression will need to be all on one line
The language you want to recognize is finite, so the easiest thing to do is just list all the cases separated by "or". The regexp you want is:
(00000:[00000|00001| ... 99999])| ... |(99998:[99998|99999])|(99999:99999)
That regexp will be several billion characters long and take quite some time to execute, but it is what you asked for: a regular expression that matches the stated language.
Obviously that's impractical. Now is it clear why regular expressions are the wrong tool for this job? Use a regular expression to match 5 digits - colon - five digits, and then once you know you have that, split up the string and convert the two sets of digits to integers that you can compare.
x <= y.
Well, you are using wrong tool. Really, regex can't help you here. Or even if you get a solution, that will be too complex, and will be too difficult to expand.
Regex is a text-processing tool to match pattern in regular languages. It is very weak when it comes to semantics. It cannot identify meaning in the given string. Like in your given condition, to conform to x <= y condition, you need to have the knowledge of their numerical values.
For e.g., it can match digits in a sequence, or a mix of digits and characters, but what it cannot do is the stuff like -
match a number greater than 15 and less than 1245, or
match a pattern which is a date between given two dates.
So, where-ever matching a pattern, involves applying semantics to the matched string, Regex is not an option there.
The appropriate way here would be to split the string on colon, and then compare numbers. For leading zero, you can find some workaround.
You can't generally* do this with regex. You can use regex to match the pattern and extract the numbers, then compare the numbers in your code.
For example to match such format (without comparing the numbers) and get the numbers you could use:
^(\d{5}):(\d{5})\z
*) You probably could in this case (as the numbers are always 5 digits and zero padded, but it wouldn't be nice.
You should do something like this instead:
bool IsCorrect(string s)
{
string[] split = s.split(':');
int number1, number2;
if (split.Length == 2 && split[0].Length == 5 && split[1].Length == 5)
{
if (int.TryParse(split[0], out number1) && int.TryParse(split[1], out number2) && number1 <= number2)
{
return true;
}
}
return false;
}
With regex you can't make comparisons to see if a number is bigger than another number.
Let me show you a good example of why you shouldn't try to do this. This is a regex that (nearly) does the same job.
https://gist.github.com/anonymous/ad74e73f0350535d09c1
Raw file:
https://gist.github.com/anonymous/ad74e73f0350535d09c1/raw/03ea835b0e7bf7ac3c5fb6f9c7e934b83fb09d95/gistfile1.txt
Except it's just for 3 digits. For 4, the program that generates these fails with an OutOfMemoryException. With gcAllowVeryLargeObjects enabled. It went on until 5GB until it crashed. You don't want most of your app to be a Regex, right?
This is not a Regex's job.
This is a two step process because regex is a text parser and not analyzer. But with that said, Regex is perfect for validating that we have the 5:5 number pattern and this regex pattern will determine if we have that form factor \d\d\d\d\d:\d\d\d\d\d right. If that form factor is not found then a match fails and the whole validation fails. If it is valid, we can use regex/linq to parse out the numbers and then check for validity.
This code would be inside a method to do the check
var data = "00515:02151";
var pattern = #"
^ # starting from the beginning of the string...
(?=[\d:]{11}) # Is there is a string that is at least 11 characters long with only numbers and a ;, fail if not
(?=\d{5}:\d{5}) # Does it fall into our pattern? If not fail the match
((?<Values>[^:]+)(?::?)){2}
";
// IgnorePatternWhitespace only allows us to comment the pattern, it does not affect the regex parsing
var result = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => mt.Groups["Values"].Captures
.OfType<Capture>()
.Select (cp => int.Parse(cp.Value)))
.FirstOrDefault();
// Two values at this point 515, 2151
bool valid = ((result != null) && (result.First () < result.Last ()));
Console.WriteLine (valid); // True
Using Javascript this can work.
var string = "00515:02152";
string.replace(/(\d{5})\:(\d{5})/, function($1,$2,$3){
return (parseInt($2)<=parseInt($3))?$1:null;
});
FIDDLE http://jsfiddle.net/VdzF7/

Retrive a Digit from a String using Regex

What I am trying to do is fairly simple, although I am running into difficulty. I have a string that is a url, it will have the format http://www.somedomain.com?id=someid what I want to retrive is the someid part. I figure I can use a regular expression but I'm not very good with them, this is what I tried:
Match match = Regex.Match(theString, #"*.?id=(/d.)");
I get a regex exception saying there was an error parsing the regex. The way I am reading this is "any number of characters" then the literal "?id=" followed "by any number of digits". I put the digits in a group so I could pull them out. I'm not sure what is wrong with this. If anyone could tell me what I'm doing wrong I would appreciated it, thanks!
No need for Regex. Just use built-in utilities.
string query = new Uri("http://www.somedomain.com?id=someid").Query;
var dict = HttpUtility.ParseQueryString(query);
var value = dict["id"]
You've got a couple of errors in your regex. Try this:
Match match = Regex.Match(theString, #".*\?id=(\d+)");
Specifically, I:
changed *. to .* (dot matches all non-newline chars and * means zero or more of the preceding)
added a an escape sequence before the ? because the question mark is a special charcter in regular expressions. It means zero or one of the preceding.
changed /d. to \d* (you had the slash going the wrong way and you used dot, which was explained above, instead of * which was also explained above)
Try
var match = RegEx.Match(theString, #".*\?id=(\d+)");
The error is probably due to preceding *. The * character in regex matches zero or more occurrences of previous character; so it cannot be the first character.
Probably a typo, but shortcut for digit is \d, not /d
. matches any character, you need to match one or more digits - so use a +
? is a special character, so it needs to be escaped.
So it becomes:
Match match = Regex.Match(theString, #".*\?id=(\d+)");
That being said, regex is not the best tool for this; use a proper query string parser or things will eventually become difficult to manage.

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.
A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson
This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.
I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Extending [^,]+, Regular Expression in C#

Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Categories

Resources