Check Formatting of a String - c#

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.

A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson

This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.

I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Related

parsing a method Signature using regular expressions

I am trying to use regular expressions to parse a method in the following format from a text:
mvAddSell[value, type1, reference(Moving, 60)]
so using the regular expressions, I am doing the following
tokensizedStrs = Regex.Split(target, "([A-Za-z ]+[\\[ ][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[\\( ][A-Za-z0-9 ]+[, ].+[\\) ][\\] ])");
It is working, but the problem is that it always gives me an empty array at the beginning if the string started with a method in the given format and the same happens if it comes at the end. Also if two methods appeared in the string, it catches only the first one! why is that ?
I think what is causing the parser not to catch two methods is the existance of ".+" in my patern, what I wanted to do is that I want to tell it that there will be a number of a date in that location, so I tell it that there will be a sequence of any chars, is that wrong ?
it woooorked with ,e =D ... I replaced ".+" by ".+?" which meant as few as possible of any number of chars ;)
Your goal is quite unclear to me. What do you want as result? If you split on that method pattern, you will get the part before your pattern and the part after your pattern in an array, but not the method itself.
Answer to your question
To answer your concrete question: your .+ is greedy, that means it will match anything till the last )] (in the same line, . does not match newline characters by default).
You can change this behaviour by adding a ? after the quantifier to make it lazy, then it matches only till the first )].
tokensizedStrs = Regex.Split(target, "([A-Za-z ]+[\\[ ][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[\\( ][A-Za-z0-9 ]+[, ].+?[\\) ][\\] ])");
Problems in your regex
There are several other problems in your regex.
I think you misunderstood character classes, when you write e.g. [\\[ ]. this construct will match either a [ or a space. If you want to allow optional space after the [ (would be logical to me), do it this way: \\[\\s*
Use a verbatim string (with a leading #) to define your regex to avoid excessive escaping.
tokensizedStrs = Regex.Split(target, #"([A-Za-z ]+\[\s*[A-Za-z0-9 ]+\s*,\s*[A-Za-z0-9 ]+\s*,\s*[A-Za-z0-9 ]+\(\s*[A-Za-z0-9 ]+\s*,\s*.+?\)s*\]\s*)");
You can simplify your regex, by avoiding repeating parts
tokensizedStrs = Regex.Split(target, #"([A-Za-z ]+\[\s*[A-Za-z0-9 ]+(?:\s*,\s*[A-Za-z0-9 ]+){2}\(\s*[A-Za-z0-9 ]+\s*,\s*.+?\)s*\]\s*)");
This is an non capturing group (?:\s*,\s*[A-Za-z0-9 ]+){2} repeated two times.

match multiple words instead of a single word

i have the following regex:
private string tokenRegEx = #"\[%RC:(\w+)%\].*?";
which is when i pass in the string below it finds it:
[%RC:TEST%]
However the following returns false
[%RC:TEST ITEM%]
how can i modify the regex to allow for spaces as well as whole words?
You need to change the \w pattern (which matches alphanum plus underscore only) to something more liberal. For example this would also allow whitespace:
private string tokenRegEx = #"\[%RC:((\w|\s)+)%\].*?";
Of course the "correct" solution would need to take into account exactly what you consider acceptable input, which is kind of open to discussion at this point.
Try this:
#"\[%RC:(\w|\s)+%\].*?";
This would do it, you have to match a space too. You use a group () but using a set [] is less expensive
private string tokenRegEx = #"\[%RC:([ \w]+)%\].*?";

Remove all zeroes except in numbers

I have to remove all the zeroes in a string, but I have to keep the zeroes in numbers.
The strings I receive are in a format similar to "zeroes-letter-zeroes-number", without the '-' and the numbers are always integers. A few examples:
"0A055" -> "A55"
"0A050" -> "A50"
"0A500" -> "A500"
"0A0505" -> "A505"
"0055" -> "55"
"0505" -> "505"
"0050" -> "50"
I know I can iterate trough the characters in the string and set a flag when I encounter a letter or a number different from 0, but I think that using a RegEx would nicer. The RegEx would also be more helpful if I'll have to use this algorithm in the database.
I tried something like this but I don't get the results that I want:
Regex r = new Regex(#"[0*([a-zA-Z]*)0*([1-9]*)]");
string result = r.Replace(input, "");
I'm not so good in writing RegEx-es so please help me if you can.
I'm not convinced that a regex is the best way to approach this, but this one works with all your test cases:
string clean = Regex.Replace(dirty, #"(?<!\d)0+|0+(?!\d|$)", "");
If I understand your pattern correctly, the following should work.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
List<String> samples = new List<String>(new[]{
"0A055","0A050","0A500","0A0505","0055","0505","0050"
});
String re = #"^0*([A-Z]*)0*([1-9]\d*)$";
// iterate over all results
samples.ForEach(n => {
Console.WriteLine("\"{0}\" -> \"{1}\"",
n,
Regex.Replace(n, re, "$1$2")
);
});
}
}
With the following output:
"0A055" -> "A55"
"0A050" -> "A50"
"0A500" -> "A500"
"0A0505" -> "A505"
"0055" -> "55"
"0505" -> "505"
"0050" -> "50"
Basically use the pattern to negate all 0s that don't matter, and use the regex replace grouping to re-concatenate the "meaningful" numbers (and letters when present).
Like some of the others I'm not sure regex is the best idea here, but this works with the test cases:
0+(?=[0-9].)|0(?=[a-zA-z])|(?<=[a-zA-Z])0+
Since you seem to only have one letter, you can split the string in two halves on that letter.
On the left part, trim all zeros.
On the right part, convert it to a number, this will drop all leading zeros or you could use TrimStart.
To do a replace with regex will be much harder than extracting the value you want. So try match the string using a simple regex like below
0*(?<letter>[A-Z])0*(?<number>\d*)
Your match result will then contain two groups, letter and number. Take the value of the two group and append them and you will get what you wanted.
Here's a Perl answer for what it's worth
s/0*([a-zA-Z]*)0*([1-9]+0*)/$1$2/g
I don't know how regex is implemented in .net, so I'll let you write the proper code with the toys in System.Text.Regularexpressions.Regex (MSDN)
Either way, this pattern should work (in pseudo-code):
Replace "(0*)(.+)" by "$2"
0* means zero or more 0
.+ means any character except end of line
$2 represents the second set of brackets (so we're simply discarding the (0*) part of the string).

Need help with a Regular Expression, Pattern Matching in C#?

I need some help with a simple pattern matching and replacement exercise I am doing?
I need to match both of the following two strings in any string in a given context and it is expected that both patterns are to exist in a given supplied string.
1) "width=000" or "width=00" or "width=0"
2) "drop=000" or "drop=00" or "drop=0"
The values can be any values between 0-9 for each case so '000' --> '999' could a valid test case in a supplied test.
string url = Regex.Replace(inputString, patternString, replacementValueString);
Thanks,
Have a look at this page to explain the individual elements: http://msdn.microsoft.com/en-us/library/az24scfc.aspx
A regex string like this should work great:
"\b(?:width|drop)\s*=\s*\d{1,3}\b"
To read the name and value in your code:
"\b(?<name>width|drop)\s*=\s*(?<value>\d{1,3})\b"
If you do not need to limit the numbers to only 3 digits, you could use the "\d+" instead of "\d{1,3}".
The "\b" at the beginning will make sure that you don't get a "width" or "drop" that is part of some larger word. The "\b" at the end will prevent you from matching numbers larger than 999.
The "\s*" on either side of the equals statement allow for "drop = 000" as well as "drop=000".
Something like this would work :
(?:width|drop)=\d{1,3}

Regex which ensures no character is repeated

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?
You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.
This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.
Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)
Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.
This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo
It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Categories

Resources