Allow conditional usage of semicolon in regex pattern - c#

I have the following pattern:
UnallowedCharacters = #"<>\{\}" + "\"";
#"^(?<contactType>\d+):(?<contactIdentifier>[^;" + UnallowedCharacters + #"]+)(;(?<parameterName>[A-Za-z0-9_-]+)=(?<parameterValue>[^;=" + UnallowedCharacters + "]+))*$"
I need to allow the usage of semicolon in the contactIdentifier part, but still to not exclude the semicolon from not allowed chars, because the later split will not work anymore.
Two examples of input and expected output are the following:
input: "8:test;aliases=1:test#outlook.com,4:test" => after parsing, expected output should be "8:test" for contactIdentifier part
input: "8.test;.person#domain.com;aliases=1:test#outlook.com,4:test" => after parsing, expected output should be "8:test;.person#domain.com" for contactIdentifier part
The semicolons are used for splitting the unparsed string into multiple parts during parsing, but I want to allow using it in contactIdentifer character group without affecting the existing matching & parsing logic.
Any ideas?

If I have understood the question, you can do this:
UnallowedCharacters = #"<>{}"""; (no need to escape inside a character group)
(?<contactIdentifier>(?:[;]|[^" + UnallowedCharacters + #"])+
Explanation:
I changed the <contactIdentifier> group to :
?<contactIdentifier> the name
(?: start of (non capturing) group
[;]| ';' OR:
[^" + UnallowedCharacters + #"] one character not in class
)+ The whole group repeated one or more times.

Related

How to validate comma separated string with space using Regex

I need to validate comma separated string using regex,but I have two problem.
My sample input as follows,
ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7 - Valid
ERW SW1,ERW SW2,ASA,S4,ERW SW5,ERW SW6,ERWSW7 - Valid(space between word should valid)
ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7, - Invalid - Comma at end
,ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7 - Invalid - Comma at beginning
ERWSW1,ERWSW2,,ASA,S4,ERWSW5,ERWSW6,ERWSW7 - Invalid - No value between 2,3 comma
I wrote following Regex to validate the input
^([a-z A-Z0-9 !##$%?=*&-]+,)*[a-z A-Z0-9 !##$%?=*&\s-]+$
First problem is when space between the commas showing as a valid string.
Eg: ERWSW1, , ,ERWSW2,ASA,S4
I need to avoid that, how can I do it?
And my second problem is, I also need to remove extra space from the string. two remove extra space I need function.(this is not related to above regex)
Input: ERWSW1 , ERW SW2,ASA ,S4 ,ERW SW5,ERWSW6,ERWSW7
I need the following output,
RWSW1,ERW SW2,ASA,S4,ERW SW5,ERWSW6,ERWSW7
Updated :
for my second problem, I wrote the following code,
string str = " ERW SW1 , ERW SW2 , ASA";
var ss = Regex.Replace(str, " *, *", ",");
But it's not removing spaces properly, I need this output
ERW SW1,ERW SW2,ASA
You could use a character class specifying what you would allow to match. For the spaces between the words you could use a repeating group preceded with a space.
^[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*(?:,[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*)*$
Regex demo
To remove the spaces around the comma's, you could match the string including the spaces and comma *, * and then replace the comma's surrounded by spaces with a single comma.
^ *[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*(?: *, *[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*)* *$
Regex demo | C# demo
Code example
string[] strings = {
"ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7",
"ERW SW1,ERW SW2,ASA,S4,ERW SW5,ERW SW6,ERWSW7",
"ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7,",
",ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7",
"ERWSW1,ERWSW2,,ASA,S4,ERWSW5,ERWSW6,ERWSW7",
"ERWSW1 , ERW SW2,ASA ,S4 ,ERW SW5,ERWSW6,ERWSW7",
"ERW*SW1,ERW-SW2,A.SA",
" ERWSW1 , ERWSW2 ,ASA,S4,ERWSW5 "
};
string pattern = #"^ *[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*(?: *, *[\w!##$%?=*&.-]+(?: [\w!##$%?=*&.-]+)*)* *$";
foreach (String s in strings) {
if (Regex.IsMatch(s, pattern)) {
Console.WriteLine(Regex.Replace(s, " *, *", ",").Trim());
}
}
Output
ERWSW1,ERWSW2,ASA,S4,ERWSW5,ERWSW6,ERWSW7
ERW SW1,ERW SW2,ASA,S4,ERW SW5,ERW SW6,ERWSW7
ERWSW1,ERW SW2,ASA,S4,ERW SW5,ERWSW6,ERWSW7
ERW*SW1,ERW-SW2,A.SA
ERWSW1,ERWSW2,ASA,S4,ERWSW5

Using Regex to replace part of the entire string/expression

Regex are simple yet complex at times. Stuck to replace an expression having variables, assuming variable is of the following pattern:
\w+(\.\w+)*
I want to replace all the occurrences of my variable replacing dot (.) because i have to eventually tokenize the expression where tokenizer do not recognize variable having dots. So i thought to replace them with underscore before parsing. After tokenizing however i want to get the variable token with original value.
Expression:
(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3
Three Variables:
x1.y2.z3
y2_z1
x1.y2.z3
Desired Output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
Question 1: How to use Regex replace in this case?
Question 2: Is there any better way to address above mentioned problem because variable can have underscore so replacing dot with underscore is not a viable solution to get the original variable back in tokens?
This regex pattern seems to work: [a-zA-Z]+\d+\S+
To replace a dot found only in a match you use MatchEvaluator:
private static char charToReplaceWith = '_';
static void Main(string[] args)
{
string s = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
Console.WriteLine(Regex.Replace(s, #"[a-zA-Z]+\d+\S+", new MatchEvaluator(ReplaceDotWithCharInMatch)));
Console.Read();
}
private static string ReplaceDotWithCharInMatch(Match m)
{
return m.Value.Replace('.', charToReplaceWith);
}
Which gives this output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
I don't fully understand your second question and how to deal with tokenizing variables that already have underscores, but you should be able to choose a character to replace with (i.e., if (string.Contains('_')) is true then you choose a different character to replace with, but probably have to maintain a dictionary that says "I replaced all dots with underscores, and all underscores with ^, etc..).
Try this:
string input = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
string output = Regex.Replace(input, "\\.(?<![a-z])", "_");
This will replace only periods which are followed by a letter (a-z).
Use Regex' negative lookahead by making a group that starts with (?!
A dot followed by something non-numeric would be as simple as this:
// matches any dot NOT followed by a character in the range 0-9
String output = Regex.Replace(input, "\\.(?![0-9])", "_");
This has the advantage that while the [0-9] is part of the expression, it is only checked as being behind the match, but is not actually part of the match.

Regular Expression without braces

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

Regex pattern works in Lua but not in C#

I need to use regex in C# to split up something like "21A244" where
The first two numbers can be 1-99
The letter can only be 1 letter, A-Z
The last three numbers can be 111-999
So I made this match
"([0-9]+)([A-Z])([0-9]+)"
but for some reason when used in C#, the match functions just return the input string. So I tried it in Lua, just to make sure the pattern was correct, and it works just fine there.
Here's the relevant code:
var m = Regex.Matches( mdl.roomCode, "(\\d+)([A-Z])(\\d+)" );
System.Diagnostics.Debug.Print( "Count: " + m.Count );
And here's the working Lua code in case you were wondering
local str = "21A244"
print(string.match( str, "(%d+)([A-Z])(%d+)" ))
Thank you for any help
EDIT: Found the solution
var match = Regex.Match(mdl.roomCode, "(\\d+)([A-Z])(\\d+)");
var group = match.Groups;
System.Diagnostics.Debug.Print( "Count: " + group.Count );
System.Diagnostics.Debug.Print("houseID: " + group[1].Value);
System.Diagnostics.Debug.Print("section: " + group[2].Value);
System.Diagnostics.Debug.Print("roomID: " + group[3].Value);
Firstly you should make your regex a little more specific and limit how many numbers are allowed at the beginning/end. How about:
([1-9]{1,2})([A-Z])([1-9]{1,3})
Next, the results of the captures (i.e. the 3 parts in parens) will be in the Groups property of your regex matcher object. I.e.
m.Groups[1] // First number
m.Groups[2] // Letter
m.Groups[3] // Second number
Regex.Matches(mdl.roomCode, "(\d+)([A-Z])(\d+)") returns an collection of matches. If there is no match, then it will return an empty MatchCollection.
Since the regular expression matches the string, it returns a colletion with one item, the input string.

Replace multiple lines with .net Regex

I am new to stackoverflow (my first post) and regex.
Currently i am working on a simple dirty app to replace baseclass properties with ctor injected fields. (cos i need to edit about 400 files)
It should find this:
ClassName(WiredObjectRegistry registry) : base(registry)
{
and replace with:
ClassName(IDependency paramName, ISecondDependency secondParam, ... )
{
_fieldName = paramName;
...
so i need to replace the two old lines with three or more new lines.
basically i was thinking:
find this ->
className + ctorParams + zero or more
whitespaces + newline + zero or more
whitespaces + {
replace with ->
className + newCtorParams + newline +
{
my field assignments
i tried this regex for .net
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
which does not replace the "{" and the whitespaces correctly
the replaced file content looks like this:
public CacheManager(ICallManager callManager, ITetraEventManager tetraEventManager, IConferenceManager conferenceManager, IAudioManager audioManager)
{
_callManager = callManager;
_tetraEventManager = tetraEventManager;
_conferenceManager = conferenceManager;
_audioManager = audioManager;
{
can u please help me with this :-|
david
If you're translating
className + ctorParams + zero or more whitespaces + newline + zero or more whitespaces + {
into regex as
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
then you're making several errors.
First, the character class for whitespace is \s. \w means "alphanumeric character".
Second, "\r|\n" will result in the alternation operator | separating the entire regex in two alternative parts (= "match either the regex before the | or the regex after the |"). In your case, you don't need this bit at all since \s will already match spaces, tabs and newlines. If you do want a regex that matches a Unix, Mac or DOS newline, use \r?\n?.
But, as the comments show, unless you show us what you really want to do, we can't help you further.

Categories

Resources