Regex.Split string into substrings by a delimiter while preserving whitespace - c#

I created a Regex to split a string by a delimiter ($), but it's not working the way I want.
var str = "sfdd fgjhk fguh $turn.bak.orm $hahr*____f";
var list = Regex.Split(str, #"(\$\w+)").Where(x => !string.IsNullOrEmpty(x)).ToList();
foreach (var item in list)
{
Console.WriteLine(item);
}
Output:
"sfdd fgjhk fguh "
"$turn"
".bak.orm "
"$hahr"
"*____f"
The problem is \w+ is not matching any periods or stars. Here's the output I want:
"sfdd fgjhk fguh "
"$turn.bak.orm"
" "
"$hahr*____f"
Essentially, I want to split a string by $ and make sure $ appears at the beginning of a substring and nowhere else (it's okay for a substring to be $ only). I also want to make sure whitespace characters are preserved as in the first substring, but any match should not contain whitespace as in the second and fourth cases. I don't care for case sensitivity.

It appears you want to split with a pattern that starts with a dollar and then has any 0 or more chars other than whitespace and dollar chars:
var list = Regex.Split(s, #"(\$[^\s$]*)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
Details
( - start of a capturing group (so that Regex.Split tokenized the string, could keep the matches inside the resulting array)
\$ - a dollar sign
[^\s$]* - a negated character class matching 0 or more chars other than whitespace (\s) and dollar symbols
) - end of the capturing group.
See the regex demo:
To include a second delimiter, you may use #"([€$][^\s€$]*)".

Related

Regex - Match everything except specified characters or split the string

I'm using this Regex [^,]+ that matches groups of non-commas but it needs to do this also for characters ; , \n and empty space.
I have this string 12f3,, 456;;;;\n\n227- , 999 from which I need to get all the substrings like 12f3 , 456, 227- and 999.
Is there a way of matching everything except some specified characters, or is best to use split in this situation?
You can match these substrings with [^\s;,]+ pattern. Splitting with [\s;,]+ is not recommended as there are often empty strings in the resulting array after splitting (due to either matches at the start/end of string, or consecutive matches).
See the regex demo.
In C#, use
var matches = Regex.Matches(text, #"[^\s;,]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
The [^\s;,]+ matches one or more (due to + quantifier) occurrences of any char other than ([^...] is a negated character class) a whitespace (\s), semi-colona dn a comma.
Non-regex approach
You can split your string with comma and semi-colon, remove empty entries and then trim the strings in the resulting array:
var text = "12f3,, 456;;;;\n\n227- , 999";
var res = text.Split(new[] {";", ","}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim());
Console.WriteLine(string.Join("\n", res));
See the C# demo. Output:
12f3
456
227-
999

Match properties using regex

I have a string like that represent a set of properties, for example:
AB=0, TX="123", TEST=LDAP, USR=" ", PROPS="DN=VB, XN=P"
I need to extract this properties in:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P
To resolve this problem I tried to use a regex, but without success.
public IEnumerable<string> SplitStr(string input)
{
Regex reg= new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))", RegexOptions.Compiled);
foreach (Match match in reg.Matches(input))
{
yield return match.Value.Trim(',');
}
}
I can't find the ideal regex to expected output. With the above regex the output is:
AB=0
123
TEST=LDAP
DN=VB, XN=P
Anyone can help me?
You may use
public static IEnumerable<string> SplitStr(string input)
{
var matches = Regex.Matches(input, #"(\w+=)(?:""([^""]*)""|(\S+)\b)");
foreach (Match match in matches)
{
yield return string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim();
}
}
The regex details:
(\w+=) - Group 1: one or more word chars and a = char
(?:""([^""]*)""|(\S+)\b) - a non-capturing group matching either of the two alternatives:
"([^"]*)" - a ", then 0 or more chars other than " and then a "
| - or
(\S+)\b - any 1+ chars other than whitespace, as many as possible, up to the word boundary position.
See the regex demo.
The string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim() code omits the Group 0 (whole match) value from the groups, takes Group 1, 2 and 3 and concats them into a single string, and trims it afterwards.
C# test:
var s = "AB=0, TX=\"123\", TEST=LDAP, USR=\" \", PROPS=\"DN=VB, XN=P\"";
Console.WriteLine(string.Join("\n", SplitStr(s)));
Output:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P
Another way could be to use 2 capturing groups where the first group captures the first part including the equals sign and the second group captures the value after the equals sign.
Then you can concatenate the groups and use Trim to remove the double quotes. If you also want to remove the whitespaces after that, you could use Trim again.
([^=\s,]+=)("[^"]+"|[^,\s]+)
That will match
( First capturing group
[^=\s,]+= Match 1+ times not an equals sign, comma or whitespace char, then match = (If the property name can contain a comma, you could instead use character class and specify what you would allow to match like for example[\w,]+)
) Close group
( Second capturing group
"[^"]+" Match from opening till closing double quote
| Or
[^,\s]+ Match 1+ times not a comma or whitespace char
)
Regex demo | C# demo
Your code might look like:
public IEnumerable<string> SplitStr(string input)
{
foreach (Match m in Regex.Matches(input, #"([^=\s,]+=)(""[^""]+""|[^,\s]+)"))
{
yield return string.Concat(m.Groups[1].Value, m.Groups[2].Value.Trim('"'));
}
}

regex to split numbers and operators

How do I split below string to list of string with numbers and operators separated (string does not contain parenthesis or negative numbers).
Example:
inputString = 1+2-2.3*4/12.12
outputList = {1,+,2,-,2.3,*,4,/,12.12}
Below will give me numbers only. I need operators as well
var digits = Regex.Split(inputString , #"\D+");
Since you confirm the structure of the input is rather simplistic - no parentheses, no negative numbers - you can just use a simple \s*([-+/*])\s* regex to split the string.
Note that Regex.Split will also output all captured substrings in the result:
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.
So, use
Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Just do not forget to remove empty elements from the resulting list/array.
Pattern details:
\s* - zero or more whitespaces (to "trim" the elements)
([-+/*]) - Group 1 capturing a -, +, / or *
\s* - zero or more whitespaces (to "trim" the elements)
See the IDEONE demo:
var input = "1+2-2.3*4/12.12";
var results = Regex.Split(input, #"\s*([-+/*])\s*")
.Where(n => !string.IsNullOrEmpty(n))
.ToList();
Console.WriteLine(string.Join("\n", results));
You could use Regex.Matches instead of Regex.Split :
var test = "1 + 2 - 2.3 * 4 / 12.12";
foreach(Match match in Regex.Matches(test, #"\d+(,\d+)*(\.\d+(e\d+)?)|\d+|[\\+-\\*]"))
Console.WriteLine(match.Value);
This seemed to work for me
/([\d\.]+)|([+-\/\*])+/g
FYI - LinqPad is an awesome tool to test Regex in C#

How to replace words following certain character and extract rest with REGEX

Assume that i have the following sentence
select PathSquares from tblPathFinding where RouteId=470
and StartingSquareId=267 and ExitSquareId=13
Now i want to replace words followed by = and get the rest of the sentence
Lets say i want to replace following word of = with %
Words are separated with space character
So this sentence would become
select PathSquares from tblPathFinding where RouteId=%
and StartingSquareId=% and ExitSquareId=%
With which regex i can achieve this ?
.net 4.5 C#
Use a lookbehind to match all the non-space or word chars which are just after to = symbol . Replacing the matched chars with % wiil give you the desired output.
#"(?<==)\S+"
OR
#"(?<==)\w+"
Replacement string:
%
DEMO
string str = #"select PathSquares from tblPathFinding where RouteId=470
and StartingSquareId=267 and ExitSquareId=13";
string result = Regex.Replace(str, #"(?<==)\S+", "%");
Console.WriteLine(result);
IDEONE
Explanation:
(?<==) Asserts that the match must be preceded by an = symbol.
\w+ If yes, then match the following one or more word characters.

Split by comma if that comma is not located between two double quotes

I am looking to split such string by comma :
field1:"value1", field2:"value2", field3:"value3,value4"
into a string[] that would look like:
0 field1:"value1"
1 field2:"value2"
2 field3:"value3,value4"
I am trying to do that with Regex.Split but can't seem to work out the regular expression.
It'll be much easier to do this with Matches than with Split, for example
string[] asYouWanted = Regex.Matches(input, #"[A-Za-z0-9]+:"".*?""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
although if there is any chance of your values (or fields!) containing escaped quotes (or anything similarly tricky), then you might be better off with a proper CSV parser.
If you do have escaped quotes in your values, I think the following regex the work - give it a test:
#"field3:""value3\\"",value4""", #"[A-Za-z0-9]+:"".*?(?<=(?<!\\)(\\\\)*)"""
The added (?<=(?<!\\)(\\\\)*) is supposed to make sure that the " it stops matching on is preceeded by only an even number of slashes, as an odd number of slashes means it is escaped.
Untested but this should be Ok:
string[] parts = string.Split(new string[] { ",\"" }, StringSplitOptions.None);
remember to add the " back on the end if you need it.
string[] arr = str.Split(new string[] {"\","}}, StringSplitOptions.None).Select(str => str + "\"").ToArray();
Split by \, as webnoob mentioned and then suffix with the trailing " using a select, then cast to an array.
try this
// (\w.+?):"(\w.+?)"
//
// Match the regular expression below and capture its match into backreference number 1 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters “:"” literally «:"»
// Match the regular expression below and capture its match into backreference number 2 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the character “"” literally «"»
try {
Regex regObj = new Regex(#"(\w.+?):""(\w.+?)""");
Match matchResults = regObj.Match(sourceString);
string[] arr = new string[match.Captures.Count];
int i = 0;
while (matchResults.Success) {
arr[i] = matchResults.Value;
matchResults = matchResults.NextMatch();
i++;
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
The easiest inbuilt way is here. I checed it . It is working fine. It splits "Hai,\"Hello,World\"" into {"Hai","Hello,World"}

Categories

Resources