Regex remove character folloing n characters - c#

I am a bit out of exercise after two years not coding.
I have thousand of lines in a txt file. All similar to this one:
X0 Y0 S-0.30
X0 Y0 S-0.21
X0 Y0 S-0.08
I need to remove the S-x.xx value from all lines. So only the X Y and the relevant values will be saved for each line.
I have attempted with Regex
if (Line.Contains("S"))
{
Regex rgx = new Regex(#"S\d+(\.\d+)?");
Line = rgx.Replace(Line, "");
}
return Line;
But I am not getting the result I expect. Any hint will be appreciated.
Thanks

You can use the following RegEx : (X\d+ Y\d+).*
And use $1.
X matches the character X literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times
Y matches the characters Y literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times

You may use
var result = Regex.Replace(s, #"\s+S-\d+(?:\.\d+)?", string.Empty);
See the regex demo
\s+ - 1+ whitespaces
S- - S- substring
\d+(?:\.\d+)? - a number pattern:
\d+ - 1+ digits
(?:\.\d+)? - an optional sequence:
\. - dot
\d+ - 1+ digits.

Appreciate that you really just want to remove the final third term, which begins with S. Just try replacing the pattern \sS.*$ with empty string:
Regex rgx = new Regex(#"\sS.*$");
string Line = "X0 Y0 S-0.30";
Line = rgx.Replace(Line, "");
Console.WriteLine(Line);
X0 Y0
Demo

Related

How to match string format in Regex c#

I am passing a correct string formate but its not return true.
string dimensionsString= "13.5 inches high x 11.42 inches wide x 16.26 inches deep";
// or 10.1 x 12.5 x 30.9 inches
// or 10.1 x 12.5 x 30.9 inches ; 3.2 pounds
Regex rgxFormat = new Regex(#"^([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+)( ; ([0-9\.]+) ([a-z]+))?$");
if (rgxFormat.IsMatch(dimensionsString))
{
//match
}
I can't understand whats wrong with code ?
Your pattern only accounts for single words after the numbers. Allow any number of symbols there (with .* or .*?) to fix the pattern:
^([0-9.]+) (.*?) x ([0-9\.]+) (.*?) x ([0-9.]+) (.*?)( ; ([0-9.]+) (.*))?$
See the regex demo.
Note that the last .* is used with a greedy quantifier since it is the last unknown bit in the string (to match all the rest of the string). The .*? are non-greedy versions that match as few occurrences of any char but a newline as possible.
Replace regular spaces with \s to match any kind of whitespace if necessary.

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

Regex - Trying to write match validation for Swedish Social Number

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want
The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);
You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Regular expression [0-99]p/sec

A part of my application requires to find out the occurrence of "1p/sec" or "22p/sec" or "22p/ sec" or ( [00-99]p/sec also [00-99]p/ sec) in a string.
so far I am able to get only the first occurrence(i.e if its a single digit, like the one in the above string). I should be able to get 'n' number of occurrence
Someone pl provide guidance
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"(\d)[p/sec]",
RegexOptions.IgnorePatternWhitespace);
// input.IndexOf("p/sec");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
You need to quantify your \d in the regex, for example by adding a + quantifier, which will then cause \d+ to match at least one but possibly more digits. To restrict to a specific number of digits, you can use the {n,m} quantifier, e.g. \d{1,2} which will then match either one or two digits.
Note also that [p/sec] as you use it in the regex is a character class, matching a single character from the set { c, e, p, s, / }, which is probably not what you want because you'd want to match the p/sec literally.
A more robust option would probably be the following
(\d+)\s*p\s*/\s*sec
which a) matches p/sec literally and also allows for whitespace between the number and the unit as well as around the /.
Use
Match match = Regex.Match(input, #"(\d{1,2})p/sec" ...
instead.
\d mathces a single digit. If you append {1,2} to that you instead match one - two digits. \d* would match zero or more and \d+ would match one or more. \d{1,10} would match 1-10 digits.
If you need to know if it was surrounded by brackets or not you could do
Match match = Regex.Match(input, #"(([\d{1,2}])|(\d{1,2}))p/sec"
...
bool hasBrackets = match.Groups[1].Value[0] == '[';
How about this regex:
^\[?\d\d?\]?/\s*sec$
Explanation:
The regular expression:
^\[?\d\d?\]?/\s*sec$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\[? '[' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
\]? ']' (optional (matching the most amount
possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
sec 'sec'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
If i get you right, you want to find all occurrances, right? Use a matchcollection
string input = "US Canada calling # 1p/sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Regex regex = new Regex(#"(\d){1,2}[p/sec]", RegexOptions.IgnorePatternWhitespace);
MatchCollection matchCollection = regex.Matches(input);
// input.IndexOf("p/sec");
// Here we check the Match instance.
foreach (Match match in matchCollection)
{
Console.WriteLine(match.Groups[0].Value);
}
You can do this:
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). 91p/ sec , 123p/ sec Validity : 30 Days.";
MatchCollection matches = Regex.Matches(input, #"\b(\d{1,2})p/\s?sec");
foreach (Match m in matches) {
string key = m.Groups[1].Value;
Console.WriteLine(key);
}
Output:
1
11
91
\b is a word boundary, to anchor the match to the start of a "word" (notice that it will not match the "123p/ sec" in the test string!)
\d{1,2} Will match one or two digits. See Quantifiers
p/\s?sec matches a literal "p/sec" with an optional whitespace before "sec"
Regex Expression:
((?<price>(\d+))p/sec)|(\[(?<price>(\d+[-]\d+))\]p/sec)
In C# you need to run a for loop to check multiple captures:
if(match.Success)
{
for(int i = 0; i< match.Groups["price"].Captures.Count;i++)
{
string key = match.Groups["price"].Value;
Console.WriteLine(key);
}
}

C# Regex Grouping

I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.

Categories

Resources