Regex - Match everything except specified characters or split the string - c#

I'm using this Regex [^,]+ that matches groups of non-commas but it needs to do this also for characters ; , \n and empty space.
I have this string 12f3,, 456;;;;\n\n227- , 999 from which I need to get all the substrings like 12f3 , 456, 227- and 999.
Is there a way of matching everything except some specified characters, or is best to use split in this situation?

You can match these substrings with [^\s;,]+ pattern. Splitting with [\s;,]+ is not recommended as there are often empty strings in the resulting array after splitting (due to either matches at the start/end of string, or consecutive matches).
See the regex demo.
In C#, use
var matches = Regex.Matches(text, #"[^\s;,]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
The [^\s;,]+ matches one or more (due to + quantifier) occurrences of any char other than ([^...] is a negated character class) a whitespace (\s), semi-colona dn a comma.
Non-regex approach
You can split your string with comma and semi-colon, remove empty entries and then trim the strings in the resulting array:
var text = "12f3,, 456;;;;\n\n227- , 999";
var res = text.Split(new[] {";", ","}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim());
Console.WriteLine(string.Join("\n", res));
See the C# demo. Output:
12f3
456
227-
999

Related

What's the best way to acquire a list of strings that match a string inside a string, by looking through a string list?

Basically I have a string array that I am using to match inside a single string:
string[] matches = { "{A}", "{B}", "{CC}" };
Then I from these I look if I find any of these inside my string:
string text = "Lorem Ipsum is {CC} simply dummy text {A} of the {CC} printing and typesetting industry {B}."
In which case, the resulting array I want to gather should be:
string[] allmatches = { "{CC}", "{A}", "{CC}", "{B}" };
Is there an easy way to do this using LINQ or maybe Regex?
Construct the regex by first Escapeing each element in matches using Select, then Joining with |. After that, get the Matches of the regex against text and Select the Values:
var regex = string.Join("|", matches.Select(Regex.Escape));
var result = Regex.Matches(text, regex)
.Cast<Match>()
.Select(x => x.Value).ToArray();
Assuming, that {A}..{Z} are the only matches required, we can try combining Regex and Linq, e.g.
string text =
#"Lorem Ipsum is {C} simply dummy text {A} of the {C} printing and typesetting industry {B}.";
string[] allmatches = Regex
.Matches(text, #"\{[A-Z]\}")
.Cast<Match>()
.Select(m => m.Value)
//.Where(item => matches.Contains(item)) // uncomment to validate matches
.ToArray();
Let's have a look:
Console.Write(string.Join(", ", allmatches));
Outcome:
{C}, {A}, {C}, {B}
Edit: uncomment .Where(...) if you want matches which are in matches[] only
Edit 2: If match doesn't necessary contain one letter only, change pattern:
.Matches(text, #"\{[A-Z]+\}") // one or more capital letters
.Matches(text, #"\{[a-zA-Z]+\}") // one or more English letters
.Matches(text, #"\{\p{L}+\}") // one or more Unicode letters
.Matches(text, #"\{[^}{]+\}") // one or more characters except "{" and "}"

Regex.Split string into substrings by a delimiter while preserving whitespace

I created a Regex to split a string by a delimiter ($), but it's not working the way I want.
var str = "sfdd fgjhk fguh $turn.bak.orm $hahr*____f";
var list = Regex.Split(str, #"(\$\w+)").Where(x => !string.IsNullOrEmpty(x)).ToList();
foreach (var item in list)
{
Console.WriteLine(item);
}
Output:
"sfdd fgjhk fguh "
"$turn"
".bak.orm "
"$hahr"
"*____f"
The problem is \w+ is not matching any periods or stars. Here's the output I want:
"sfdd fgjhk fguh "
"$turn.bak.orm"
" "
"$hahr*____f"
Essentially, I want to split a string by $ and make sure $ appears at the beginning of a substring and nowhere else (it's okay for a substring to be $ only). I also want to make sure whitespace characters are preserved as in the first substring, but any match should not contain whitespace as in the second and fourth cases. I don't care for case sensitivity.
It appears you want to split with a pattern that starts with a dollar and then has any 0 or more chars other than whitespace and dollar chars:
var list = Regex.Split(s, #"(\$[^\s$]*)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
Details
( - start of a capturing group (so that Regex.Split tokenized the string, could keep the matches inside the resulting array)
\$ - a dollar sign
[^\s$]* - a negated character class matching 0 or more chars other than whitespace (\s) and dollar symbols
) - end of the capturing group.
See the regex demo:
To include a second delimiter, you may use #"([€$][^\s€$]*)".

regex to split numbers and operators

How do I split below string to list of string with numbers and operators separated (string does not contain parenthesis or negative numbers).
Example:
inputString = 1+2-2.3*4/12.12
outputList = {1,+,2,-,2.3,*,4,/,12.12}
Below will give me numbers only. I need operators as well
var digits = Regex.Split(inputString , #"\D+");
Since you confirm the structure of the input is rather simplistic - no parentheses, no negative numbers - you can just use a simple \s*([-+/*])\s* regex to split the string.
Note that Regex.Split will also output all captured substrings in the result:
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.
So, use
Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Just do not forget to remove empty elements from the resulting list/array.
Pattern details:
\s* - zero or more whitespaces (to "trim" the elements)
([-+/*]) - Group 1 capturing a -, +, / or *
\s* - zero or more whitespaces (to "trim" the elements)
See the IDEONE demo:
var input = "1+2-2.3*4/12.12";
var results = Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Console.WriteLine(string.Join("\n", results));
You could use Regex.Matches instead of Regex.Split :
var test = "1 + 2 - 2.3 * 4 / 12.12";
foreach(Match match in Regex.Matches(test, #"\d+(,\d+)*(\.\d+(e\d+)?)|\d+|[\\+-\\*]"))
Console.WriteLine(match.Value);
This seemed to work for me
/([\d\.]+)|([+-\/\*])+/g
FYI - LinqPad is an awesome tool to test Regex in C#

How do I split a string by a character, but only when it is not contained within parentheses?

Input: ((Why,Heck),(Ask,Me),(Bla,No))
How can I split this data into a string array:
Element1 (Why,Heck)
Element2 (Ask,Me)
Element3 (Bla,No)
I tried the String.Split or String.TrimEnd/Start but no chance the result is always wrong.
Would it be better with Regex?
var input = "((Why,Heck),(Ask,Me),(Bla,No))";
var result = Regex.Matches(input, #"\([^\(\)]+?\)")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Another - non regex approach which should work:
string[] result = str.Split(new[]{"),"}, StringSplitOptions.None)
.Select(s => string.Format("({0})", s.Trim('(', ')')))
.ToArray();
Demo
you could also:
remove all parenthesis to simplify your splits
split by ','
Read your returned array in groups of two. That's using a for loop or a similar recursive algorithm, get indices 0 and 1, 2 and 3 e.t.c
Reconstruct with parenthesis
Or you could just use Regular expressions

Split by comma if that comma is not located between two double quotes

I am looking to split such string by comma :
field1:"value1", field2:"value2", field3:"value3,value4"
into a string[] that would look like:
0 field1:"value1"
1 field2:"value2"
2 field3:"value3,value4"
I am trying to do that with Regex.Split but can't seem to work out the regular expression.
It'll be much easier to do this with Matches than with Split, for example
string[] asYouWanted = Regex.Matches(input, #"[A-Za-z0-9]+:"".*?""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
although if there is any chance of your values (or fields!) containing escaped quotes (or anything similarly tricky), then you might be better off with a proper CSV parser.
If you do have escaped quotes in your values, I think the following regex the work - give it a test:
#"field3:""value3\\"",value4""", #"[A-Za-z0-9]+:"".*?(?<=(?<!\\)(\\\\)*)"""
The added (?<=(?<!\\)(\\\\)*) is supposed to make sure that the " it stops matching on is preceeded by only an even number of slashes, as an odd number of slashes means it is escaped.
Untested but this should be Ok:
string[] parts = string.Split(new string[] { ",\"" }, StringSplitOptions.None);
remember to add the " back on the end if you need it.
string[] arr = str.Split(new string[] {"\","}}, StringSplitOptions.None).Select(str => str + "\"").ToArray();
Split by \, as webnoob mentioned and then suffix with the trailing " using a select, then cast to an array.
try this
// (\w.+?):"(\w.+?)"
//
// Match the regular expression below and capture its match into backreference number 1 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters “:"” literally «:"»
// Match the regular expression below and capture its match into backreference number 2 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the character “"” literally «"»
try {
Regex regObj = new Regex(#"(\w.+?):""(\w.+?)""");
Match matchResults = regObj.Match(sourceString);
string[] arr = new string[match.Captures.Count];
int i = 0;
while (matchResults.Success) {
arr[i] = matchResults.Value;
matchResults = matchResults.NextMatch();
i++;
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
The easiest inbuilt way is here. I checed it . It is working fine. It splits "Hai,\"Hello,World\"" into {"Hai","Hello,World"}

Categories

Resources