Split string by different marks - c#

how to split string by several different symbols, for example like dot . and - in c# string
string str = "sally-vikram.dean.sarah-ray";
but without replace all to same mark:
str = str.Replace("-", "."):
and split by dot for example:
string[] words = str.Split('.');
to get:
sally
vikram
dean
sarah
ray

string.Split can actually take an array of values:
string[] words = str.Split('.', '-');

For your use case, a regex character class (MSDN) is a good choise:
string[] words = Regex.Split(str, "[.-]");
Note: Since - is also used to define a character range like a-z it's good practice to put the - at the end of character group. Otherwise, just escape it, e.g. \-.
This is most appropriate if you expected that you need further delimiters and other requirements, find the regex more readable, and performance isn't an issue (the Regex.Split is much slower than the String.Split equivalent).

Related

Regex to find 2 words before and after a keyword

I need to find 2 words before and after a keyrowrd like below:
Here is a testing string with some more testing strings.
Keyword - with
Result - "testing string with some more"
Here is a regex I prepared but is not working for spaces in between.
(?:\S+\s)?\S*(?:\S+\s)?\S*text\S*(?:\s\S+)?\S*(?:\s\S+)?
When you're using \S*, this means non-whitespace characters, so your spaces will get in the way.
I suggest the following regex: (\S+)\s*(\S+)\s*with\s*(\S+)\s*(\S+), whichs means:
(\S+): text that doesn't include whitespace characters (a word).
/s*: zero or more spaces (in between the words)
After using it, you'll get 4 groups that correspect to the 2 words before the with and 2 words after it.
Try the regex here: https://regex101.com/r/Mk67s2/1
Try this:
([a-zA-Z]+\s+){2}with(\s+[a-zA-Z]+){2}
Here Is Demo
Try below:
string testString = "Here is a testing string with some more testing strings.";
string keyword = "with";
string pattern = $#"\w+\s+\w+\s+{keyword}\s+\w+\s+\w+";
string match = Regex.Match(testString, pattern).Value;

Regex.Split on "() " and "?"

myString= "First?Second Third";
String[] result = Regex.Split(myString, #"( )\?");
Should result:
First,
Second,
Third
What am I missing? (I also need brackets to split on for something else)
I guess with ( ), you meant whitespace. You don't need any capturing group there. Just use alteration, or a character class:
String[] result = Regex.Split(myString, #"\s|\?");
// OR
String[] result = Regex.Split(myString, #"[\s?]");
Using string methods:
myString= "First?Second Third";
String[] result = myString.Split(' ','?');
I'm not quite sure what you are trying to do with the quotes. Remember that in C# parenthesis are used to denote a logical group in your regular expression, they do not escape a space. Rather you want to split on an explicit set of characters, which is denoted by brackets []. You should use the following pattern to split:
String[] result = Regex.Split(myString, #"[\?\s]");
Note that \? is an escaped space (as you had in your original). White-space characters are escaped as \s. Thus, my solution is essentially saying to separate the string on any of the explicitly indicated characters (based on the []) and lists those characters as ? (escaped as \?) and " " (escaped as \s).
EDIT AFTER MORE INFO FROM OP:
I also saw, after answering this post, that you editted the top comment to say you wanted a logical grouping for the white-space, in which case I would go with:
String[] result = Regex.Split(myString, #"[\?(\s)]");
You need to surround those chars inside [] to create a range of them. [\s\?] This will split on:
a space
?
You can use \s to handle "any" whitespace char.

Regular Expression - Remove zeroes inside an expression

I need to remove leading zeroes from the numerical part of an expression (using .net 2.0 C# Regex class).
Ex:
PAR0000034 -> PAR34
WP0003204 -> WP3204
I tried the following:
//keep starting characters, get rid of leading zeroes, keep remaining digits
string result = Regex.Replace(inputStr, "^(.+)(0+)(/d*)", "$1$3", RegexOptions.IgnoreCase)
Obviously, it did not work. I need a bit of help to find the mistake.
You don't need a regular expression for that, the Split method can do that for you.
Splitting on '0', removing empty entries (i.e. between the mulitple zeroes), and limiting the result to two strings will give you the two strings before and after the leading zeroes. Then you just put those two strings together again:
string result = String.Concat(
input.Split(new char[] { '0' }, 2, StringSplitOptions.RemoveEmptyEntries)
);
In your expression the .* part is greedy, so it catches full string. Further
use backslash instead of slash for digit \d
string result = Regex.Replace(inputStr, #"^([^0]+)(0+)(\d*)", "$1$3");
Or use look behind instead:
string result = Regex.Replace(inputStr, "(?<=[a-zA-Z])0+", "");
This works for me:
Regex.Replace("PPP00001001", "([^0]*)0+(.*)", "$1$2");
The phrase "leading zeroes" is confusing, since the zeroes you're talking about aren't actually at the beginning of the string. But if I understand you correctly, you want this:
string result = Regex.Replace(inputStr, "^(.*?)0+", "$1");
There are actually several ways to do it, with and without regex, but the above is probably the shortest and easiest to understand. The important part is the .*? lazy quantifier. This will ensure that it a) finds only the first string of zeroes, and b) deletes all the "leading" zeroes in the string.

Regex to isolate a specific substring

I have this string I have retrieved from a File.ReadAllText:
6 11 rows processed
As you can see there is always an integer specifying the line number in this document. What I am interested in is the integer that comes after it and the words "rows processed". So in this case I am only interested in the substring "11 rows processed".
So, knowing that each line will start with an integer and then some white space, I need to be able to isolate the integer that follows it and the words "rows processed" and return that to a string by itself.
I have been told this is easy to do with Regex, but so far I haven't the faintest clue how to build it.
You don't need regular expressions for this. Just split on the whitespace:
var fields = s.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(String.Join(" ", fields.Skip(1));
Here, I am using the fact that if you pass an empty array as the char [] parameter to String.Split, it splits on all whitespace.
This should work for what you need:
\d+(.*)
This searches for 1 or more digits (\d+) and then it puts everything afterwards in a group:
. = any character
* = repeater (zero or more of the preceding value (which is any character in the above
() = grouping
However, Jason is correct in that you only need to use a split function
If you need to use a Regex it would be like this:
string result = null;
Match match = Regex.Match(row, #"^\s*\d+\s*(.*)");
if (match.Success)
result = match.Groups[1].Value;
The regex matches from start of row: first spaces if any, then digits and then more spaces. Last it extracts rest of line and return it as result.
This is done easily with Regex.Replace() using the following regular expression...
^\d+\s+
So it'd be something like this:
return Regex.Replace(text, #"^\d+\s+", "");
Basically you're just trimming the first number \d and the whitespace \s that follows.
Example in PHP(C# regex should be compatible):
$line = "6 11 rows processed";
$resp = preg_match("/[0-9]+\s+(.*)/",$line,$out);
echo $out[1];
I hope I catched your point.

regex split to implement tokenizer

I have a string with all possible chars and now I want to split it by following
"+"
",OU="
can anyone show me how to do this with regex.split?
I tried many times, but still no luck
I'm using C#
I think you can use string.split, which you can specify multiple separators.
string[] separator = new string[]{"+", ",OU="};
string[] resultTokens = testString.split(separator, StringSplitOption.None);
for the Regex version :
string[] split = Regex.Split(yourstring, #"\+|OU=");
You may have needed a backslash in front of the "+" to treat it as a literal, and you're probably defining the regex using a string, so the string itself will want the backslash character escaped. It can be easier to read to use square brackets instead.
"([+]|,[Oo][Uu]=)"

Categories

Resources