Splitting of a string using Regex

Splitting of a string using Regex - c#

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?

You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);

If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

Related

Regex - extract rest of string after specific sequence

I have a long string with random letters, numbers, and spaces.
I need a regex expression to pull out the part of the string after the sequence of characters and numbers --> AQ102.
For example :
string t = "kjdsjsk158dfdd 125.196.168.210helloAQ102Lab101 section2";
desired output:
Lab101 section2

Why not use
string s = t.Split("AQ102").Last();

Or, a regular expression as originally asked for:
Regex regEx = new Regex(#".*(AQ102.*)");
OR
Regex regEx = new Regex(#".*(AQ102)(.*)");
And you can get the matches doing the following:
Matches matches = regEx.Matches(t);
And you can get the match by referencing the first index:
matches[1]
OR, if you're really confident:
string val = regEx.Matches(t)[1].Value;

Don't need Regex for this. A simple split should suffice:
string output = input.Split(new string[] { "AQ102" }, StringSplitOptions.None)[1];
Depend on how sure you are of your input, you may want to check that AQ102 exist first, or even to count how many times... but as I said, depends on your scenario.

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text

Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.

Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.

If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();

The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

Regular Expression to get all characters before "-"

How can I get the string before the character "-" using regular expressions?
For example, I have "text-1" and I want to return "text".

So I see many possibilities to achieve this.
string text = "Foobar-test";
Regex Match everything till the first "-"
Match result = Regex.Match(text, #"^.*?(?=-)");
^ match from the start of the string
.*? match any character (.), zero or more times (*) but as less as possible (?)
(?=-) till the next character is a "-" (this is a positive look ahead)
Regex Match anything that is not a "-" from the start of the string
Match result2 = Regex.Match(text, #"^[^-]*");
[^-]* matches any character that is not a "-" zero or more times
Regex Match anything that is not a "-" from the start of the string till a "-"
Match result21 = Regex.Match(text, #"^([^-]*)-");
Will only match if there is a dash in the string, but the result is then found in capture group 1.
Split on "-"
string[] result3 = text.Split('-');
Result is an Array the part before the first "-" is the first item in the Array
Substring till the first "-"
string result4 = text.Substring(0, text.IndexOf("-"));
Get the substring from text from the start till the first occurrence of "-" (text.IndexOf("-"))
You get then all the results (all the same) with this
Console.WriteLine(result);
Console.WriteLine(result2);
Console.WriteLine(result21.Groups[1]);
Console.WriteLine(result3[0]);
Console.WriteLine(result4);
I would prefer the first method.
You need to think also about the behavior, when there is no dash in the string. The fourth method will throw an exception in that case, because text.IndexOf("-") will be -1. Method 1 and 2.1 will return nothing and method 2 and 3 will return the complete string.

Here is my suggestion - it's quite simple as that:
[^-]*

This is something like the regular expression you need:
([^-]*)-
Quick tests in JavaScript:
/([^-]*)-/.exec('text-1')[1] // 'text'
/([^-]*)-/.exec('foo-bar-1')[1] // 'foo'
/([^-]*)-/.exec('-1')[1] // ''
/([^-]*)-/.exec('quux')[1] // explodes

I dont think you need regex to achieve this. I would look at the SubString method along with the indexOf method. If you need more help, add a comment showing what you have attempted and I will offer more help.

You could just use another non-regex based method. Someone gave the suggestion of using Substring, but you could also use Split:
string testString = "my-string";
string[] splitString = testString.Split("-");
string resultingString = splitString[0]; //my
See http://msdn.microsoft.com/en-US/library/ms228388%28v=VS.80%29.aspx for another good example.

If you want use RegEx in .NET,
Regex rx = new Regex(#"^([\w]+)(\-)*");
var match = rx.Match("thisis-thefirst");
var text = match.Groups[1].Value;
Assert.AreEqual("thisis", text);

Find all word and space characters up to and including a -
^[\w ]+-

C# Regex.Split - Subpattern returns empty strings

Hey, first time poster on this awesome community.
I have a regular expression in my C# application to parse an assignment of a variable:
NewVar = 40
which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:
var r = new Regex(#"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);
My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.
Using the above input example, this is what's returned from Regex.Split:
mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""
So my question is: why does it return an empty string at the beginning and the end?
Thanks.

The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:
All the text before your match, which is ""
All () groups within your match, which are "NewVar" and "40"
All the text after your match, which is ""
RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.
What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:
var r = new Regex(#"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;
Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.

From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.
Instead, try the following:
var r = new Regex(#"\s*=\s*");
var mc = r.Split(command);
Note that the whitespace is actually consumed as a part of the delimiter.

RegEx Problem using .NET

I have a little problem on RegEx pattern in c#. Here's the rule below:
input: 1234567
expected output: 123/1234567
Rules:
Get the first three digit in the input. //123
Add /
Append the the original input. //123/1234567
The expected output should looks like this: 123/1234567
here's my regex pattern:
regex rx = new regex(#"((\w{1,3})(\w{1,7}))");
but the output is incorrect. 123/4567

I think this is what you're looking for:
string s = #"1234567";
s = Regex.Replace(s, #"(\w{3})(\w+)", #"$1/$1$2");
Instead of trying to match part of the string, then match the whole string, just match the whole thing in two capture groups and reuse the first one.

It's not clear why you need a RegEx for this. Why not just do:
string x = "1234567";
string result = x.Substring(0, 3) + "/" + x;

Another option is:
string s = Regex.Replace("1234567", #"^\w{3}", "$&/$&"););
That would capture 123 and replace it to 123/123, leaving the tail of 4567.
^\w{3} - Matches the first 3 characters.
$& - replace with the whole match.
You could also do #"^(\w{3})", "$1/$1" if you are more comfortable with it; it is better known.

Use positive look-ahead assertions, as they don't 'consume' characters in the current input stream, while still capturing input into groups:
Regex rx = new Regex(#"(?'group1'?=\w{1,3})(?'group2'?=\w{1,7})");
group1 should be 123, group2 should be 1234567.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Splitting of a string using Regex - c#

If you only want the part after the first dot you don't need a regex at all: x.Substring(x.IndexOf('.'))

Related

Regex - extract rest of string after specific sequence

match first digits before # symbol

Regular Expression to get all characters before "-"

C# Regex.Split - Subpattern returns empty strings

RegEx Problem using .NET

Categories

Resources