Regex - extract rest of string after specific sequence - c#

I have a long string with random letters, numbers, and spaces.
I need a regex expression to pull out the part of the string after the sequence of characters and numbers --> AQ102.
For example :
string t = "kjdsjsk158dfdd 125.196.168.210helloAQ102Lab101 section2";
desired output:
Lab101 section2

Why not use
string s = t.Split("AQ102").Last();

Or, a regular expression as originally asked for:
Regex regEx = new Regex(#".*(AQ102.*)");
OR
Regex regEx = new Regex(#".*(AQ102)(.*)");
And you can get the matches doing the following:
Matches matches = regEx.Matches(t);
And you can get the match by referencing the first index:
matches[1]
OR, if you're really confident:
string val = regEx.Matches(t)[1].Value;

Don't need Regex for this. A simple split should suffice:
string output = input.Split(new string[] { "AQ102" }, StringSplitOptions.None)[1];
Depend on how sure you are of your input, you may want to check that AQ102 exist first, or even to count how many times... but as I said, depends on your scenario.

Related

Use regular expression in C# to select a specific occurrence from a string by limiting input

Using C#, i am stuck while trying to extract a specific string while limiting the string to be matched. Here is my input string:
NPS_CNTY01_10112018_Adult_Submittal.txt
I would like to extract 01 after CNTY and ingnore anything after 01.
So far i have the regex to be:
(?!NPS_CNTY)\d{2}
But the above regex gets many other digit matches from the input string. One approach i was thinking was to limit the input to 9 characters to eventually get 01. But somehow not able to achieve that. Any help is appreciated.
I would like to add that the only variable data in this input string is:
NPS_CNTY[two digit county code excluding this bracket]_[date in MMDDYYYY format excluding the brackets]_Adult_Submittal.txt.
Also please limit solutions to regex's.
The (?!NPS_CNTY)\d{2} pattern matches a location that is not immediately followed with NPS_CNTY and then matches 2 digits. The lookahead always returns true since two digits cannot start a NPS_CNTY char sequence, it is redundant.
You may use a positive lookbehind like this to get 01:
var m = Regex.Match(s, #"(?<=NPS_CNTY)\d+");
var result = "";
if (m.Success)
{
result = m.Value;
}
See the .NET regex demo
Here, (?<=NPS_CNTY), a positive lookbehind, matches a location that is immediately preceded with NPS_CNTY and then \d+ matches 1 or more digits.
An equivalent solution using capturing mechanism is
var m = Regex.Match(s, #"NPS_CNTY(\d+)");
var result = "";
if (m.Success)
{
result = m.Groups[1].Value;
}
If the string always start with NPS_CNTY and you have to extract 2 digits then you don't need a regular expression. Just use Substring() method:
string text = #"NPS_CNTY01_01141980_Adult_Submittal.txt";
string digits = text.Substring(8, 2);
EDIT:
In case you need to match N digits after NPS_CNTY you can use the following code:
string text = #"NPS_CNTY012_01141980_Adult_Submittal.txt";
string digits = text.Replace("NPS_CNTY", string.Empty)
.Split("_", StringSplitOptions.RemoveEmptyEntries)
.FirstOrDefault();

Regex first digits occurrence

My task is extract the first digits in the following string:
GLB=VSCA|34|speed|1|
My pattern is the following:
(?x:VSCA(\|){1}(\d.))
Basically I need to extract "34", the first digits occurrence after the "VSCA". With my pattern I obtain a group but would be possibile to get only the number? this is my c# snippet:
string regex = #"(?x:VSCA(\|){1}(\d.))";
Regex rx = new Regex(regex);
string s = "GLB=VSCA|34|speed|1|";
if (rx.Match(s).Success)
{
var test = rx.Match(s).Groups[1].ToString();
}
You could match 34 (the first digits after VSCA) using a positive lookbehind (?<=VSCA\D*) to assert that what is on the left side is VSCA followed by zero or times not a digit \D* and then match one or more digits \d+:
(?<=VSCA\D*)\d+
If you need the pipe to be after VSCA the you could include that in the lookbehind:
(?<=VSCA\|)\d+
Demo
This regex pattern: (?<=VSCA\|)\d+?(?=\|) will match only the number. (If your number can be negative / have decimal places you may want to use (?<=VSCA\|).+?(?=\|) instead)
You don't need Regex for this, you can simply split on the '|' character:
string s = "GLB=VSCA|34|speed|1|";
string[] parts = s.Split('|');
if(parts.Length >= 2)
{
Console.WriteLine(parts[1]); //prints 34
}
The benefit here is that you can access all parts of the original string based on the index:
[0] - "GLB=VSCA"
[1] - "34"
[2] - "speed"
[3] - "1"
Fiddle here
While the other answers work really well, if you really must use a regular expression, or are interested in knowing how to get to that straight away you can use a named group for the number. Consider the following code:
string regex = #"(?x:VSCA(\|){1}(?<number>\d.?))";
Regex rx = new Regex(regex);
string s = "GLB:VSCA|34|speed|1|";
var match = rx.Match(s);
if(match.Success) Console.WriteLine(match.Groups["number"]);
How about (?<=VSCA\|)[0-9]+?
Try it out here

Splitting of a string using Regex

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?
You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);
If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,
Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.
If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.
If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text
Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.
Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.
If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();
The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

Categories

Resources