I have a string that contains an array
string str = "array[0]=[1,a,3,4,asdf54,6];array[1]=[1aaa,2,4,k=6,2,8];array[2]=[...]";
I'd like to split it to get an array like this:
str[0] = "[1,a,3,4,asdf54,6]";
str[1] = "[1aaa,2,4,k=6,2,8]";
str[2] = ....
I've tried to use Regex.Split(str, #"\[\D+\]") but it didn't work..
Any suggestions?
Thanks
SOLUTION:
After seen your answers I used
var arr = Regex.Split(str, #"\];array\[[\d, -]+\]=\[");
This works just fine, thanks all!
var t = str.Split(';').Select(s => s.Split(new char[]{'='}, 2)).Select(s => s.Last()).ToArray();
In regex, \d matches any digit, whilst \D matches anything that is not a digit. I assume your use of the latter is erroneous. Additionally, you should allow your regex to also match negation signs, commas, and spaces, using the character class [\d\-, ]. You can also include a lookahead lookbehind for the = character, written as (?<=\=), in order to avoid getting the [0], [1], [2], ...
string str = "array[0]=[1,2,3,4,5,6];array[1]=[1,2,4,6,2,8];array[2]=[...]";
string[] results = Regex.Matches(str, #"(?<=\=)\[[\d\-, ]+\]")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Try this - using regular expression look behind to grab the relevant parts of your string.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace RegexSplit
{
class Program
{
static void Main(string[] args)
{
string str = "array[0]=[1,2,3,4,5,6];array[1]=[1,2,4,6,2,8];array[2]=[...]";
Regex r = new Regex(#"(?<=\]=)(\[.+?\])");
string[] results = r.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToArray();
}
}
}
BONUS - convert to int[][] if you are fancy.
int[][] ints = results.Select(p => p.Split(new [] {'[', ',', ']'}, StringSplitOptions.RemoveEmptyEntries)
.Where(s => { int temp; return int.TryParse(s, out temp);}) //omits the '...' match from your sample. potentially you could change the regex pattern to only catch digits but that isnt what you wanted (i dont think)
.Select(s => int.Parse(s)).ToArray()).ToArray();
Regex would be an option, but it would be a bit complicated. Assuming you don't have a parser for your input, you can try the following:
-Split the string by ; characters, and you'd get a string array (e.g. string[]):
"array[0]=[1,2,3,4,5,6]", "array[1]=[1,2,4,6,2,8]", "array[2]=[...]". Let's call it list.
Then for each of the elements in that array (assuming the input is in order), do this:
-Find the index of ]=, let that be x.
-Take the substring of your whole string from the starting index x + 2. Let's call it sub.
-Assign the result string as the current string in your array, e.g. if you are iterating with a regular for loop, and your indexing variable is i such as for(int i = 0; i < len; i++){...}:
list[i] = sub.
I know it is a dirty and an error-prone solution, e.g. if input is array[0] =[1,2... instead of array[0]=[1,2,... it won't work due to the extra space there, but if your input mathces that exact pattern (no extra spaces, no newlines etc), it will do the job.
UPDATE: cosset's answer seems to be the most practical and easiest way to achieve your result, especially if you are familiar with LINQ.
string[] output=Regex.Matches(input,#"(?<!array)\[.*?\]")
.Cast<Match>()
.Select(x=>x.Value)
.ToArray();
OR
string[] output=Regex.Split(input,#";?array[\d+]=");
string str = "array[0]=[1,a,3,4,asdf54,6];array[1]=[1aaa,2,4,k=6,2,8];array[2]=[2,3,2,3,2,3=3k3k]";
string[] m1 = str.Split(';');
List<string> m3 = new List<string>();
foreach (string ms in m1)
{
string[] m2 = ms.Split('=');
m3.Add(m2[1]);
}
Related
I have this types of string:
"10a10", "10b5641", "5a1121", "438z2a5f"
and I need to remove anything after the FIRST a-zA-Z char in the string (the symbol itself should be removed as well). What could be a solution?
Examples of results I expect:
"10a10" returns "10"
"10b5641" returns "10"
"5a1121" returns "5"
"438z2a5f" returns "438"
You could use Regular Expressions along with Regex, something like:
string str = "10a10";
str = Regex.Replace(str, #"[a-zA-Z].*", "");
Console.WriteLine(str);
will output:
10
Basically it will takes everything that starts with a-zA-Z and everything after it (.* matches any characters zero or unlimited times) and remove it from the string.
An easy to understand approach would be to use the String.IndexOfAny Method to find the Index of the first a-zA-Z char, and then use the String.Substring Method to cut the string accordingly.
To do so you would create an array containing all a-zA-Z characters and use this as an argument to String.IndexOfAny. After that you use 0 and the result of String.IndexOfAny as arguments for String.Substring.
I am pretty sure there are more elegant ways to do this, but this seems the most basic approach to me, so its worth mentioning.
You could do so using Linq as follows.
var result = new string(strInput.TakeWhile(x => !char.IsLetter(x)).ToArray());
var sList = new List<string> { "10a10", "10b5641", "5a1121", "438z2a5f" };
foreach (string s in sList.ToArray())
{
string number = new string(s.TakeWhile(c => !Char.IsLetter(c)).ToArray());
Console.WriteLine(number);
}
Either Linq:
var result = string.Concat(strInput
.TakeWhile(c => !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')));
Or regular expression:
using System.Text.RegularExpressions;
...
var result = Regex.Match(strInput, "^[^A-Za-z]*").Value;
In both cases starting from strInput beginning take characters until a..z or A-Z occurred
Demo:
string[] tests = new[] {
"10a10", "10b5641", "5a1121", "438z2a5f"
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-10} returns \"{Regex.Match(test, "^[^A-Za-z]*").Value}\""));
Console.Write(demo);
Outcome:
10a10 returns "10"
10b5641 returns "10"
5a1121 returns "5"
438z2a5f returns "438"
I am trying to filter out the addressnumber of on inputstring, but the problem is my code yet leads to unwanted results when a string with multiple numbers comes in.
Is there a possibility to tell the Regex to filter into an array or something like that to recognize if there was more than one number in the original string?
String theNumbers = String.Join(String.Empty, Regex.Matches(inputString, #"\d+").OfType<Match>().Select(m => m.Value));
I tried it on a different way now aswell, but Regex.Split generates empty Strings in the Array and just filtering them out seems a bit hacky to me.
String[] extractedNumbersArray = Regex.Split(inputString, #"\D+");
Hope this helps (online):
using System;
using System.Text.RegularExpressions;
using System.Linq;
public class Program
{
public static void Main()
{
var inputString = "1 2 3";
var values = Regex
.Matches(inputString, #"(?<nr>\d+)")
.OfType<Match>()
.Select(m => m.Groups["nr"].Value)
.ToArray();
Console.WriteLine("Multipe numbers: " + (values.Length > 1 ? "yep" : "nope"));
foreach (var v in values)
{
Console.WriteLine(v);
}
}
}
I would like to use the ((?!(SEPARATOR)).)* regex pattern for splitting a string.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var separator = "__";
var pattern = String.Format("((?!{0}).)*", separator);
var regex = new Regex(pattern);
foreach (var item in regex.Matches("first__second"))
Console.WriteLine(item);
}
}
It works fine when a SEPARATOR is a single character, but when it is longer then 1 character I get an unexpected result. In the code above the second matched string is "_second" instead of "second". How shall I modify my pattern to skip the whole unmatched separator?
My real problem is to split lines where I should skip line separators inside quotes. My line separator is not a predefined value and it can be for example "\r\n".
You can do something like this:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "plum--pear";
string pattern = "-"; // Split on hyphens
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
}
}
// The method displays the following output:
// 'plum'
// ''
// 'pear'
The .NET regex does not does not support matching a piece of text other than a specific multicharacter string. In PCRE, you would use (*SKIP)(*FAIL) verbs, but they are not supported in the native .NET regex library. Surely, you might want to use PCRE.NET, but .NET regex can usually handle those scenarios well with Regex.Split
If you need to, say, match all but [anything here], you could use
var res = Regex.Split(s, #"\[[^][]*]").Where(m => !string.IsNullOrEmpty(m));
If the separator is a simple literal fixed string like __, just use String.Split.
As for your real problem, it seems all you need is
var res = Regex.Matches(s, "(?:\"[^\"]*\"|[^\r\n\"])+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
See the regex demo
It matches 1+ (due to the final +) occurrences of ", 0+ chars other than " and then " (the "[^"]*" branch) or (|) any char but CR, LF or/and " (see [^\r\n"]).
I am trying to use regex to split the string into 2 arrays to turn out like this.
String str1 = "First Second [insideFirst] Third Forth [insideSecond] Fifth";
How do I split str1 to break off into 2 arrays that look like this:
ary1 = ['First Second','Third Forth','Fifth'];
ary2 = ['insideFirst','insideSecond'];
here is my solution
string str = "First Second [insideFirst] Third Forth [insideSecond] Fifth";
MatchCollection matches = Regex.Matches(str,#"\[.*?\]");
string[] arr = matches.Cast<Match>()
.Select(m => m.Groups[0].Value.Trim(new char[]{'[',']'}))
.ToArray();
foreach (string s in arr)
{
Console.WriteLine(s);
}
string[] arr1 = Regex.Split(str,#"\[.*?\]")
.Select(x => x.Trim())
.ToArray();
foreach (string s in arr1)
{
Console.WriteLine(s);
}
Output
insideFirst
insideSecond
First Second
Third Forth
Fifth
Plz Try below code. Its working fine for me.
String str1 = "First Second [insideFirst] Third Forth [insideSecond] Fifth";
var output = String.Join(";", Regex.Matches(str1, #"\[(.+?)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value));
string[] strInsideBreacket = output.Split(';');
for (int i = 0; i < strInsideBreacket.Count(); i++)
{
str1 = str1.Replace("[", ";");
str1 = str1.Replace("]", "");
str1 = str1.Replace(strInsideBreacket[i], "");
}
string[] strRemaining = str1.Split(';');
Plz look at below screen shot of output while debugging code:
Here,
strInsideBreacket is array of breacket value like insideFirst andinsideSecond
and strRemaining is array of First Second,Third Forth and Fifth
Thanks
Try this solution,
String str1 = "First Second [insideFirst] Third Forth [insideSecond] Fifth";
var allWords = str1.Split(new char[] { '[', ']' }, StringSplitOptions.RemoveEmptyEntries);
var result = allWords.GroupBy(x => x.Contains("inside")).ToArray();
The idea is that, first get all words and then the group it.
It seems to me that "user2828970" asked a question with an example, not with literal text he wanted to parse. In my mind, he could very well have asked this question:
I am trying to use regex to split a string like so.
var exampleSentence = "I had 185 birds but 20 of them flew away";
var regexSplit = Regex.Split(exampleSentence, #"\d+");
The result of regexSplit is: I had, birds but, of them flew away.
However, I also want to know the value which resulted in the second string splitting away from its preceding text, and the value which resulted in the third string splitting away from its preceding text. i.e.: I want to know about 185 and 20.
The string could be anything, and the pattern to split by could be anything. The answer should not have hard-coded values.
Well, this simple function will perform that task. The code can be optimized to compile the regex, or re-organized to return multiple collections or different objects. But this is (nearly) the way I use it in production code.
public static List<Tuple<string, string>> RegexSplitDetail(this string text, string pattern)
{
var splitAreas = new List<Tuple<string, string>>();
var regexResult = Regex.Matches(text, pattern);
var regexSplit = Regex.Split(text, pattern);
for (var i = 0; i < regexSplit.Length; i++)
splitAreas.Add(new Tuple<string, string>(i == 0 ? null : regexResult[i - 1].Value, regexSplit[i]));
return splitAreas;
}
...
var result = exampleSentence.RegexSplitDetail(#"\d+");
This would return a single collection which looks like this:
{ null, "I had "}, // First value, had no value splitting it from a predecessor
{"185", " birds but "}, // Second value, split from the preceding string by "185"
{ "20", " of them flew away"} // Third value, split from the preceding string by "20"
Being that this is a .NET Question and, apart from my more favoured approach in my other answer, you can also capture the Split Value another VERY Simple way. You just then need to create a function to utilize the results as you see fit.
var exampleSentence = "I had 185 birds but 20 of them flew away";
var regexSplit = Regex.Split(exampleSentence, #"(\d+)");
The result of regexSplit is: I had, 185, birds but, 20, of them flew away. As you can see, the split values exist within the split results.
Note the subtle difference compared to my other answer. In this regex split, I used a Capture Group around the entire pattern (\d+) You can't do that!!!?.. can you?
Using a Capture Group in a Split will force all capture groups of the Split Value between the Split Result Capture Groups. This can get messy, so I don't suggest doing it. It also forces somebody using your function(s) to know that they have to wrap their regexes in a capture group.
I have a Regex to split out words operators and brackets in simple logic statements (e.g. "WORD1 & WORD2 | (WORd_3 & !word_4 )". the Regex I've come up with is "(?[A-Za-z0-9_]+)|(?[&!\|()]{1})". Here is a quick test program.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("* Test Project *");
string testExpression = "!(LIONV6 | NOT_superCHARGED) &RHD";
string removedSpaces = testExpression.Replace(" ", "");
string[] expectedResults = new string[] { "!", "(", "LIONV6", "|", "NOT_superCHARGED", ")", "&", "RHD" };
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})");
Console.WriteLine("Expected\n{0}\nActual\n{1}", expectedResults.AllElements(), splits.AllElements());
Console.WriteLine("*** Any Key to finish ***");
Console.ReadKey();
}
}
public static class Extensions
{
public static string AllElements(this string[] str)
{
string output = "";
if (str != null)
{
foreach (string item in str)
{
output += "'" + item + "',";
}
}
return output;
}
}
The Regex does the required job of splitting out words and operators into an array in the right sequence, but the result array contains many empty elements, and I can't work out why. Its not a serious problem as I just ignore empty elements when consuming the array but I'd like Regex to do all the work if possible, including ignoring spaces.
Try this:
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})").Where(x => x != String.Empty);
The spaces are jsut becasue of the way the split works. From the help page:
If multiple matches are adjacent to one another, an empty string is inserted into the array.
What split is doing as standard is taking your matches as delimiters. So in effect the standard that would be returned is a lot of empty strings between the adjacent matches (imagine as a comparison what you might expect if you split ",,,," on ",", you'd probably expect all the gaps.
Also from that help page though is:
If capturing parentheses are used in a Regex.Split expression, any
captured text is included in the resulting string array.
This is the reason you are getting what you actually want in there at all. So effectively it is now showing you the text that has been split (all the empty strings) with the delimiters too.
What you are doing may well be better off done with just matching the regular expression (with Regex.Match) since what is in your regular expression is actually what you want to match.
Something like this (using some linq to convert to a string array):
Regex.Matches(testExpression, #"([A-Za-z0-9_]+)|([&!\|()]{1})")
.Cast<Match>()
.Select(x=>x.Value)
.ToArray();
Note that because this is taking positive matches it doesn't need the spaces to be removed first.
var matches = Regex.Matches(removedSpaces, #"(\w+|[&!|()])");
foreach (var match in matches)
Console.Write("'{0}', ", match); // '!', '(', 'LIONV6', '|', 'NOT_superCHARGED', ')', '&', 'RHD',
Actually, you don't need to delete spaces before extracting your identifiers and operators, the regex I proposed will ignore them anyway.