Regular Expression to split a sting - c#

I have a string like
[123,234,321,....]
Can i use a regular expression to extract only the numbers ?
this is the code i am using now
public static string[] Split()
{
string s = "[123,234,345,456,567,678,789,890,100]";
var temp = s.Replace("[", "").Replace("]", "");
char[] separator = { ',' };
return temp.Split(separator);
}

You can use string.Split for this - no need for a regular expression, though your code can be simplified:
var nums = s.Split('[', ']', ',');
Thought you may want to exclude empty entries in the returned array:
var nums = s.Split(new[] { '[', ']', ',' },
StringSplitOptions.RemoveEmptyEntries);

There's an overload to Trim() that takes a character.
You could do this.
string s = "[123,234,345,456,567,678,789,890,100]";
var nums = s.Trim('[').Trim(']').Split(',');

If you want to use a regular expression, try:
string s = "[123,234,345,456,567,678,789,890,100]";
var matches = Regex.Matches(s, #"[0-9]+", RegexOptions.Compiled);
However, regular expressions tend to make your code less readable, so you might stick with your original approach.

Try with using string.Split method;
string s = "[123,234,345,456,567,678,789,890,100]";
var numbers = s.Split('[',']', ',');
foreach(var i in numbers )
Console.WriteLine(i);
Here is a DEMO.
EDIT: As Oded mentioned, you may want to use StringSplitOptions.RemoveEmptyEntries also.

string s = "[123,234,345,456,567,678,789,890,100]";
MatchCollection matches = Regex.Matches(s, #"(\d+)[,\]]");
string[] result = matches.OfType<Match>().Select(m => m.Groups[1].Value).ToArray();
Here the # is used to signify a verbatim string literal and allows the escape character '\' to be used directly in Regular expression notation without escaping itself "\".
\d is a digit, \d+ mean 1 or more digits. The parenthesis signify a group so (\d+) means I want a group of digits. (*See group used a little later)
[,\]] square brackets, in brief, mean choose any one of my element so it will choose either the comma , or a square bracket ] which I had to escape.
So the regular expression will find the expressions of sequential digits followed by a , or ]. The Matches will return the set of matches (which we use because there are multiple set) then we go through each match - with some LINQ - and grab the index 1 group which is the second group, "But we only made one group?" We only specified one group, the first group (index 0) is the entire regular expression match, which in our case, will include the , or ] which we don't want.

while you can and probably should use string.Split as other answers indicate, the question specifically asks if you can do it with regex, and yes, you can :-
var r = new Regex(#"\d+", RegexOptions.Compiled );
var matches = r.Matches("[123,234,345,456,567,678,789,890,100]");

Related

C# Regex split() without removing the split condition character

I am splitting a string with regex using its Split() method.
var splitRegex = new Regex(#"[\s|{]");
string input = "/Tests/ShowMessage { 'Text': 'foo' }";
//second version of the input:
//string input = "/Tests/ShowMessage{ 'Text': 'foo' }";
string[] splittedText = splitRegex.Split(input, 2);
The string is just a sample pattern of the input. There are two different structures of input, once with a space before the { or without the space. I want to split the input on the { bracket in order to get the following result:
/Tests/ShowMessage
{ 'Text': 'foo' }
If there is a space, the string gets splitted there (space gets removed) and i get my desired result. But if there isnt a space i split the string on the {, so the { gets removed, what i dont want though. How can i use Regex.Split() without removing the split condition character?
The square brackets create a character set, so you want it to match exactly one of those inner characters. For your desire start off by removing them.
So to match it a random count of whitespaces you have to add *, the result is this one\s*.
\s is a whitespace
* means zero-or-more
That you don't remove the split condition character, you can use lookahead assertion (?=...).
(?=...) or (?!...) is a lookahead assertion
The combined Regex looks like this: \s*(?={)
This is a really good and detailed documentation of all the different Regex parts, you might have a look at it. Furthermore you can test your Regex easy and for free here.
In order to not include the curly brace in the match you can put it into a look ahead
\s*(?={)
That will match any number of white spaces up to the position before a open curly brace.
You can use regular string split, on "{" and trim the spaces off:
var bits = "/Tests/ShowMessage { 'Text': 'foo' }".Split("{", StringSplitOptions.RemoveEmptyEntries);
bits[0] = bits[0].TrimEnd();
bits[1] = "{" + bits[1];
If you want to use the RegEx route, you can add the { back if you change the regex a bit:
var splitRegex = new Regex(#"\s*{");
string input = "/Tests/ShowMessage { 'Text': 'foo' }";
//second version of the input:
//string input = "/Tests/ShowMessage{ 'Text': 'foo' }";
string[] splittedText = splitRegex.Split(input, 2);
splittedText[1] = "{" + splittedText[1];
It means "split at occurrence of (zero or more whitespace followed by {)" - so the split operation nukes your spaces (you want), and your { (you don't want) but you can put the { back with certainty that it will mean you get what you want
var splitedList = srt.Text.Replace(".", ".#").Replace("?", "?#").Replace("!", "!#").Split(new[] { "#"}, StringSplitOptions.RemoveEmptyEntries).ToList();
This will split text for .!? and will not remove condition chars. For better result just replace # with some uniq char. Like this one for example '®' That is all. Simple as it is. No regex.split which is slow and difficult due to many different task criterias, etc...
passing-> "Hello. I'am dev!"
result (split condition character exist )
"Hello."
"I'am dev!"

How to split a string every time the character changes?

I'd like to turn a string such as abbbbcc into an array like this: [a,bbbb,cc] in C#. I have tried the regex from this Java question like so:
var test = "aabbbbcc";
var split = new Regex("(?<=(.))(?!\\1)").Split(test);
but this results in the sequence [a,a,bbbb,b,cc,c] for me. How can I achieve the same result in C#?
Here is a LINQ solution that uses Aggregate:
var input = "aabbaaabbcc";
var result = input
.Aggregate(" ", (seed, next) => seed + (seed.Last() == next ? "" : " ") + next)
.Trim()
.Split(' ');
It aggregates each character based on the last one read, then if it encounters a new character, it appends a space to the accumulating string. Then, I just split it all at the end using the normal String.Split.
Result:
["aa", "bb", "aaa", "bb", "cc"]
I don't know how to get it done with split. But this may be a good alternative:
//using System.Linq;
var test = "aabbbbcc";
var matches = Regex.Matches(test, "(.)\\1*");
var split = matches.Cast<Match>().Select(match => match.Value).ToList();
There are several things going on here that are producing the output you're seeing:
The regex combines a positive lookbehind and a negative lookahead to find the last character that matches the one preceding it but does not match the one following it.
It creates capture groups for every match, which are then fed into the Split method as delimiters. The capture groups are required by the negative lookahead, specifically the \1 identifier, which basically means "the value of the first capture group in the statement" so it can not be omitted.
Regex.Split, given a capture group or multiple capture groups to match on when identifying the splitting delimiters, will include the delimiters used for every individual Split operation.
Number 3 is why your string array is looking weird, Split will split on the last a in the string, which becomes split[0]. This is followed by the delimiter at split[1], etc...
There is no way to override this behaviour on calling Split.
Either compensation as per Gusman's answer or projecting the results of a Matches call as per Ruard's answer will get you what you want.
To be honest I don't exactly understand how that regex works, but you can "repair" the output very easily:
Regex reg = new Regex("(?<=(.))(?!\\1)", RegexOptions.Singleline);
var res = reg.Split("aaabbcddeee").Where((value, index) => index % 2 == 0 && value != "").ToArray();
Could do this easily with Linq, but I don't think it's runtime will be as good as regex.
A whole lot easier to read though.
var myString = "aaabbccccdeee";
var splits = myString.ToCharArray()
.GroupBy(chr => chr)
.Select(grp => new string(grp.Key, grp.Count()));
returns the values `['aaa', 'bb', 'cccc', 'd', 'eee']
However this won't work if you have a string like "aabbaa", you'll just get ["aaaa","bb"] as a result instead of ["aa","bb","aa"]

Split the string based on regular expression

I've an array of strings like Name, Groups[0].Id, Types[11].Name.
I want to filter the string that has square brackets and split them into two parts. For ex., Groups[0].Id into Groups and Id.
How I can find the strings that has square brackets using regular expression?
You can try this
Regex.Split(input,#"\[.*?\][.]");
Just for splitting a single string like
string value = "Groups[0].Id";
use
string[] parts = Regex.Split(value, "\[\d+\]\.");
Explanation: you have to escape the square bracket and dot characters with a backslash (they have special meanings within a regular expression) and \d+ will accept only a string of number digits ('0'..'9') with at least one digit.
Links:
A nice .NET regex test page is http://regexhero.net/
MSDN documentation on Regex: http://msdn.microsoft.com/en-us/library/8yttk7sy.aspx
I'm not sure if you wanted to split the strings which is implied by your question title, or filter the list which seems to be what your asking at the end. You can split each element of the array with brackets and a periods this regex. This regex does not assume that the indices are digits alone -- for example it will allow an array keyed by strings.
Regex.Split(a, #"\[[^\]]+\]\.");
REY
You can use LINQ to Filter the array in one line.
string[] ary = new string[3] {"Name", "Groups[0].Id", "Types[11].Name" };
ary = ary.Where(a => Regex.Match(a, #"\[[^\]]+\]\.").Success).ToArray();
foreach (string str in ary)
{
Console.WriteLine(str);
}

Split by comma if that comma is not located between two double quotes

I am looking to split such string by comma :
field1:"value1", field2:"value2", field3:"value3,value4"
into a string[] that would look like:
0 field1:"value1"
1 field2:"value2"
2 field3:"value3,value4"
I am trying to do that with Regex.Split but can't seem to work out the regular expression.
It'll be much easier to do this with Matches than with Split, for example
string[] asYouWanted = Regex.Matches(input, #"[A-Za-z0-9]+:"".*?""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
although if there is any chance of your values (or fields!) containing escaped quotes (or anything similarly tricky), then you might be better off with a proper CSV parser.
If you do have escaped quotes in your values, I think the following regex the work - give it a test:
#"field3:""value3\\"",value4""", #"[A-Za-z0-9]+:"".*?(?<=(?<!\\)(\\\\)*)"""
The added (?<=(?<!\\)(\\\\)*) is supposed to make sure that the " it stops matching on is preceeded by only an even number of slashes, as an odd number of slashes means it is escaped.
Untested but this should be Ok:
string[] parts = string.Split(new string[] { ",\"" }, StringSplitOptions.None);
remember to add the " back on the end if you need it.
string[] arr = str.Split(new string[] {"\","}}, StringSplitOptions.None).Select(str => str + "\"").ToArray();
Split by \, as webnoob mentioned and then suffix with the trailing " using a select, then cast to an array.
try this
// (\w.+?):"(\w.+?)"
//
// Match the regular expression below and capture its match into backreference number 1 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters “:"” literally «:"»
// Match the regular expression below and capture its match into backreference number 2 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the character “"” literally «"»
try {
Regex regObj = new Regex(#"(\w.+?):""(\w.+?)""");
Match matchResults = regObj.Match(sourceString);
string[] arr = new string[match.Captures.Count];
int i = 0;
while (matchResults.Success) {
arr[i] = matchResults.Value;
matchResults = matchResults.NextMatch();
i++;
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
The easiest inbuilt way is here. I checed it . It is working fine. It splits "Hai,\"Hello,World\"" into {"Hai","Hello,World"}

C# Regex.Split - Subpattern returns empty strings

Hey, first time poster on this awesome community.
I have a regular expression in my C# application to parse an assignment of a variable:
NewVar = 40
which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:
var r = new Regex(#"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);
My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.
Using the above input example, this is what's returned from Regex.Split:
mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""
So my question is: why does it return an empty string at the beginning and the end?
Thanks.
The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:
All the text before your match, which is ""
All () groups within your match, which are "NewVar" and "40"
All the text after your match, which is ""
RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.
What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:
var r = new Regex(#"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;
Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.
From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.
Instead, try the following:
var r = new Regex(#"\s*=\s*");
var mc = r.Split(command);
Note that the whitespace is actually consumed as a part of the delimiter.

Categories

Resources