regex to split numbers and operators - c#

How do I split below string to list of string with numbers and operators separated (string does not contain parenthesis or negative numbers).
Example:
inputString = 1+2-2.3*4/12.12
outputList = {1,+,2,-,2.3,*,4,/,12.12}
Below will give me numbers only. I need operators as well
var digits = Regex.Split(inputString , #"\D+");

Since you confirm the structure of the input is rather simplistic - no parentheses, no negative numbers - you can just use a simple \s*([-+/*])\s* regex to split the string.
Note that Regex.Split will also output all captured substrings in the result:
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.
So, use
Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Just do not forget to remove empty elements from the resulting list/array.
Pattern details:
\s* - zero or more whitespaces (to "trim" the elements)
([-+/*]) - Group 1 capturing a -, +, / or *
\s* - zero or more whitespaces (to "trim" the elements)
See the IDEONE demo:
var input = "1+2-2.3*4/12.12";
var results = Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Console.WriteLine(string.Join("\n", results));

You could use Regex.Matches instead of Regex.Split :
var test = "1 + 2 - 2.3 * 4 / 12.12";
foreach(Match match in Regex.Matches(test, #"\d+(,\d+)*(\.\d+(e\d+)?)|\d+|[\\+-\\*]"))
Console.WriteLine(match.Value);

This seemed to work for me
/([\d\.]+)|([+-\/\*])+/g
FYI - LinqPad is an awesome tool to test Regex in C#

Related

Regex - Match everything except specified characters or split the string

I'm using this Regex [^,]+ that matches groups of non-commas but it needs to do this also for characters ; , \n and empty space.
I have this string 12f3,, 456;;;;\n\n227- , 999 from which I need to get all the substrings like 12f3 , 456, 227- and 999.
Is there a way of matching everything except some specified characters, or is best to use split in this situation?
You can match these substrings with [^\s;,]+ pattern. Splitting with [\s;,]+ is not recommended as there are often empty strings in the resulting array after splitting (due to either matches at the start/end of string, or consecutive matches).
See the regex demo.
In C#, use
var matches = Regex.Matches(text, #"[^\s;,]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
The [^\s;,]+ matches one or more (due to + quantifier) occurrences of any char other than ([^...] is a negated character class) a whitespace (\s), semi-colona dn a comma.
Non-regex approach
You can split your string with comma and semi-colon, remove empty entries and then trim the strings in the resulting array:
var text = "12f3,, 456;;;;\n\n227- , 999";
var res = text.Split(new[] {";", ","}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim());
Console.WriteLine(string.Join("\n", res));
See the C# demo. Output:
12f3
456
227-
999

Regex to match alphanumeric except specific substring

Edit:
MANDATORY CONDITION:
Regex has to be inserted into the following statement:
Regex regex = new Regex("<REGEX_STRING>");
val= regex.Matches(val).Cast<Match>().Aggregate("", (s, e) => s + e.Value, s => s);
I found out that I can't use Regex.Replace() method as it was suggested in the answer below.
I am looking for a RegEx that would have to follow two conditions:
accept only a-z, A-Z, 0-9, \s (one or more), and ignore _ (that's why \w is not an option)
[!] exclude any {sq} "substring" anywhere inside the string
*{sq} - it's literally this 4-chars string, not any shortcut for ASCII sign !
What I have so far is:
\b(?!sq)[a-zA-Z0-9 ]*
but this RegEx cuts everything when _ shows up + it also excludes i.e whole [sq].
So for example for a given string:
test[sq]uirrel{sq}_things I should get testsquirrelthings and what I get is: testuirrel
Small input | expected output table below:
Input string
Expected output
Na#me
Name
M2a_ny
M2any
Vari{sq}o#us
Various
test [sq]uirrel h23ere!
test squirrel h23ere
I would really appreciate any help, it's the most complicated RegEx I have ever came across šŸ™„
The problem is that it is not possible in .NET regex to match any text but a multicharacter sequence.
You will have to use a terrible workaround like
((?:(?!{sq})[A-Za-z0-9\s])+)|{sq}
and you will need to get Group 1 values. See the .NET regex demo. Here is a C# demo:
var texts = new List<string> { "Na#me","M2a_ny","Vari{sq}o#us","test [sq]uirrel h23ere!" };
var pattern = #"((?:(?!{sq})[A-Za-z0-9\s])+)|{sq}";
foreach (var text in texts) {
var result = Regex.Matches(text, pattern).Cast<Match>()
.Aggregate("", (s, e) => s + e.Groups[1].Value, s => s);
Console.WriteLine(result);
}
// => Name, M2any, Various, test squirrel h23ere
A better, Regex.Replace based solution
You can remove {sq} and all non-letter and non-whitespace chars using
Regex.Replace(text, #"{sq}|[^a-zA-Z0-9\s]", "")
Regex.Replace(text, #"{sq}|[^\p{L}\p{N}\s]", "")
The \p{L} / \p{N} version can be used to support any Unicode letters/digits.
See the .NET regex demo:

Regex.Split string into substrings by a delimiter while preserving whitespace

I created a Regex to split a string by a delimiter ($), but it's not working the way I want.
var str = "sfdd fgjhk fguh $turn.bak.orm $hahr*____f";
var list = Regex.Split(str, #"(\$\w+)").Where(x => !string.IsNullOrEmpty(x)).ToList();
foreach (var item in list)
{
Console.WriteLine(item);
}
Output:
"sfdd fgjhk fguh "
"$turn"
".bak.orm "
"$hahr"
"*____f"
The problem is \w+ is not matching any periods or stars. Here's the output I want:
"sfdd fgjhk fguh "
"$turn.bak.orm"
" "
"$hahr*____f"
Essentially, I want to split a string by $ and make sure $ appears at the beginning of a substring and nowhere else (it's okay for a substring to be $ only). I also want to make sure whitespace characters are preserved as in the first substring, but any match should not contain whitespace as in the second and fourth cases. I don't care for case sensitivity.
It appears you want to split with a pattern that starts with a dollar and then has any 0 or more chars other than whitespace and dollar chars:
var list = Regex.Split(s, #"(\$[^\s$]*)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
Details
( - start of a capturing group (so that Regex.Split tokenized the string, could keep the matches inside the resulting array)
\$ - a dollar sign
[^\s$]* - a negated character class matching 0 or more chars other than whitespace (\s) and dollar symbols
) - end of the capturing group.
See the regex demo:
To include a second delimiter, you may use #"([ā‚¬$][^\sā‚¬$]*)".

Regex first digits occurrence

My task is extract the first digits in the following string:
GLB=VSCA|34|speed|1|
My pattern is the following:
(?x:VSCA(\|){1}(\d.))
Basically I need to extract "34", the first digits occurrence after the "VSCA". With my pattern I obtain a group but would be possibile to get only the number? this is my c# snippet:
string regex = #"(?x:VSCA(\|){1}(\d.))";
Regex rx = new Regex(regex);
string s = "GLB=VSCA|34|speed|1|";
if (rx.Match(s).Success)
{
var test = rx.Match(s).Groups[1].ToString();
}
You could match 34 (the first digits after VSCA) using a positive lookbehind (?<=VSCA\D*) to assert that what is on the left side is VSCA followed by zero or times not a digit \D* and then match one or more digits \d+:
(?<=VSCA\D*)\d+
If you need the pipe to be after VSCA the you could include that in the lookbehind:
(?<=VSCA\|)\d+
Demo
This regex pattern: (?<=VSCA\|)\d+?(?=\|) will match only the number. (If your number can be negative / have decimal places you may want to use (?<=VSCA\|).+?(?=\|) instead)
You don't need Regex for this, you can simply split on the '|' character:
string s = "GLB=VSCA|34|speed|1|";
string[] parts = s.Split('|');
if(parts.Length >= 2)
{
Console.WriteLine(parts[1]); //prints 34
}
The benefit here is that you can access all parts of the original string based on the index:
[0] - "GLB=VSCA"
[1] - "34"
[2] - "speed"
[3] - "1"
Fiddle here
While the other answers work really well, if you really must use a regular expression, or are interested in knowing how to get to that straight away you can use a named group for the number. Consider the following code:
string regex = #"(?x:VSCA(\|){1}(?<number>\d.?))";
Regex rx = new Regex(regex);
string s = "GLB:VSCA|34|speed|1|";
var match = rx.Match(s);
if(match.Success) Console.WriteLine(match.Groups["number"]);
How about (?<=VSCA\|)[0-9]+?
Try it out here

C# Regex - starts with pattern1 not contain pattern2

for the following input string contains all of these:
a1.aaa[SUBSCRIBED]
a1.bbb
a1.ccc
b1.ddd
d1.ddd[SUBSCRIBED]
I want to get the output:
bbb
ccc
which means: all the words that come after "a1." And not contain the substring "[SUBSCRIBED]"
all the words comes after "a1." And not contains the substring
"[SUBSCRIBED]"
Why regex? Following is crystal clear:
var result = strings
.Where(s => s.StartsWith("a1.") && !s.Contains("[SUBSCRIBED]"))
.Select(s => s.Substring(3));
Tim's answer makes sense. However if you insist on it I would venture that a Regex would look like this though.
^a1\.(.*)(?<!\[SUBSCRIBED\])$
with ^a1 meaning starts with a1
\.(.*) taking any number of character
and the negative lookbehind (?<!\[SUBSCRIBED\])$ would refuse text ending with [SUBSCRIBED]
You may use
^a1\.(?!.*\[SUBSCRIBED])(.*)
See the regex demo.
Details
^ - start of string
a1\. - a literal a1. substring
(?!.*\[SUBSCRIBED]) - a negative lookahead that fails the match if there is a [SUBSCRIBED] substring is present after any 0+ chars (other than newline if the RegexOptions.Singleline option is not used)
(.*) - Group 1: the rest of the line up to the end (if you use RegexOptions.Singleline option, . will match newlines as well).
C# code:
var result = string.Empty;
var m = Regex.Match(s, #"^a1\.(?!.*\[SUBSCRIBED])(.*)");
if (m.Success)
{
result = m.Groups[1].Value;
}

Categories

Resources