Regular expression of partial url

Regular expression of partial url - c#

I have the following:
https://www.example.com/my-suburl/sub-dept/xx-xxxx-xx-yyyyyy/
Im trying to find the 'yyyyy' in the url so far I have:
(.*)\/sub-dept\/(.*[^\/])\/([^\/]*)$
Which matches on:
https://www.example.com/my-suburl
and
xx-xxxx-xx-yyyyyy
However like i say I need the 'yyyyy' specific match

NON-C#-BASED SOLUTION
If xx are numbers in the actual strings, just use
\d+(?=\/$)
Or else, use
[^-\/]*(?=\/?$)
See Demo 1 and Demo 2
Note that in JS, there is no look-behind, thus, if you must check if /sub-dept/ is in front of the substring you need, you will have to rely on capturing group mechanism:
\/sub-dept\/[^\/]*-([^-\/]*)\/?
See yet another demo
ORIGINAL ANSWER
Here is a regex you can use
(?<=/sub-dept/[^/]*-)[^/-]*(?=/$)
See demo
The regex matches a substring that contains 0 or more characters other than a / or - that is...
(?<=/sub-dept/[^/]*-) preceded with /sub-dept/ followed by 1 or more characters other than / and then a hyphen
(?=/$) - is followed by a / symbol right at the end of the string.
Or, there is a non-regex way: split the string by /, get the last part and split by -. Here is an example (without error/null checking for demo sake):
var result = text.Trim('/').Split('/').LastOrDefault().Split('-').LastOrDefault();

Related

Variable-length lookbehind for backslashes

What seemed to be a simple task ended up to not work as expected...
I'm trying to match \$\w+\b, unless it's preceded by an uneven number of backslashes.
Examples (only $result should be in the match):
This $result should be matched
This \$result should not be matched
This \\$result should be matched
This \\\$result should not be matched
etc...
The following pattern works:
(?<!\\)(\\\\)*\$\w+\b
However, even repeats of backslashes are included in the match, which is unwanted, so I'm trying to achieve this purely with a variable-length lookbehind, but nothing I tried so far seems to work.
Any regex virtuoso here can lend a hand?

You may use the following pattern:
(?<!(?:^|[^\\])\\(?:\\\\)*)\$\w+\b
Demo.
Breakdown of the Lookbehind; i.e., not preceded by:
(?:^|[^\\]) - Beginning of string/line or any character other than backslash.
\\ - Then, one backslash character.
(?:\\\\)* Then, any even number of backslash characters (including zero).

Looks like asking the question helped me answer my own question.
The part I don't want to be matched has to be wrapped with a positive lookbehind.
(?<=(?<!\\)(\\\\)*)\$\w+\b
Also works if the $result is at the start of the line.
If anyone has more optimal solutions, shoot!

This regular expression gets the wanted text in the third capture group:
(^| )(\\\\)*(\$\w+\b)
Explanation:
(^| ) Either beginning of line or a space
(\\\\)* An even number of backslash characters, including none
( Start of capture group 3
\$\w+\b The wanted text
) End of capture group 3

Split credit card number into 4 chunks using Regex lookahead?

I want to chunk a credit card number (in my case I always have 16 digits) into 4 chunks of 4 digits.
I've succeeded doing it via positive look ahead :
var s="4581458245834584";
var t=Regex.Split(s,"(?=(?:....)*$)");
Console.WriteLine(t);
But I don't understand why the result is with two padded empty cells:
I already know that I can use "Remove Empty Entries" flag , But I'm not after that.
However - If I change the regex to (?=(?:....)+$) , then I get this result :
Question
Why does the regex emit empty cells ? and how can I fix my regex so it produce 4 chunks at first place ( without having to 'trim' those empty entries )

But I don't understand why the result is with two padded empty cells:
Let's try breaking down your regex.
Regex: (?=(?:....)*$)
Explanation: Lookahead (?=) for anything 4 times(?:....) for zero or more times. Just looking ahead and matching nothing will match zero width.
Since you are using * quantifier which says zero or more it matches first zero width at beginning or string and also at end of string.
Visualize it from this snapshot of Regex101 Demo
[
So How can I select only those 3 splitters in the middle ?
I don't know C# very well but this 3 step method might work for you.
Search with (\d{4}) and replace with -\1. Result will be -4581-4582-4583-4584. Demo
Now replace first - by searching with ^-. Result will be 4581-4582-4583-4584. Demo
At last search for - and split on it. Demo. Used \n to substitute for demo purpose.
Alternative Solution Inspired from Royi's answer.
Regex: (?=(?!^)(?:\d{4})+$)
Explanation:
(?= // Look ahead for
(?!^) // Not the start of string
(?:\d{4})+$ // Multiple group of 4 digits till end of string
)
Since nothing is matched and only lookaround assertions are used, it will pinpoint Zero width after a group of 4 digits.
Regex101 Demo

It seems like I've found an answer.
Looking at those splitters - I needed to get rid of the edges :
So I thought - how can I tell the regex engine "not at the start of the line " ?
Which is exactly what (?!^) does
So here is the new regex :
var s="4581458245834584";
var t=Regex.Split(s,"(?!^)(?=(?:....)+$)");
Console.WriteLine(t);
Result :

Umm, I don't know WHY you need Regex for this. You just overcomplicate things. Better way is to just split it manually:
var values = new List<int>();
for(int i =0;i < 4;i++)
{
var value = int.Parse(s.Substring(i*4, 4));
values.Add(value);
}
Regex solution:
var s = "4581458245834584";
var separated = Regex.Match(s, "(.{4}){4}").Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToArray();

It has been mentioned already that the * quantifier also matches at the end of string where there are zero group-matches ahead. To avoid matching at start and end you can use \B non word boundary which only matches between two word characters not giving matches for start and end.
\B(?=(?:.{4})+$)
See demo at regex101
Because the lookahead won't be triggered at start or end of the string you could even use *

Don't use capturing groups in c# Regex

I am writing a regular expression in Visual Studio 2013 using C#
I have the following scenario:
Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");
But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text
The string to get will be always between the second and the third group of %%
Another approach is that the string will be always between the fourth and fifth %
I tried:
Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");
But with no luck.
I want to use match.Value because all my regex are in a database table.
Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:
(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^ ^^^^^
If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):
(?<=^%[^%]*%)[^%]+(?=%)
See regex demo
And in case it is between the 4th and the 5th:
(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
For single-% delimited strings (see demo):
(?<=^%(?:[^%]*%){3})[^%]+(?=%)
See another demo
Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.
The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.
In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):
(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)
Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.

Regular expression for specific combination of alphabets and numbers

I am trying to create regular expression for following type of strings:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ), numerical digits only, and either no ‘Z’ or a ‘Z’ suffix.
For example, XD35Z should pass but XD01HW should not pass.
So far I tried following:
#"XD\d+Z?" - XD35Z passes but unfortunately it also works for XD01HW
#"XD\d+$Z" - XD01HW fails which is what I want but XD35Z also fails
I have also tried #"XD\d{1,}Z"? but it did not work
I need a single regex which will give me appropriate results for both types of strings.

Try this regex:
^(XI|YV|XD|YQ|XZ){1}\d+Z{0,1}$
I'm using quantifying braces to explicitly limit the allowed numbers of each character/group. And the ^ and $ anchors make sure that the regex matches only the whole line (string).
Broken into logical pieces this regex checks
^(XI|YV|XD|YQ|XZ){1} Starts with exactly one of the allowed prefixes
\d+ Is follow by one or more digits
Z{0,1}$ Ends with between 0 and 1 Z

You're misusing the $ which represents the end of the string in the Regex
It should be : #"^XD\d+Z?$" (notice that it appears at the end of the Regex, after the Z?)

The regex following the behaviour you want is:
^(XI|YV|XD|YQ|XZ)\d+Z?$
Explanation:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ)
^(XI|YV|XD|YQ|XZ)
numerical digits only
\d+
‘Z’ or a ‘Z’ suffix
Z?$

What is the correct RegEx to extract my substring

I have an input string like this
$(xx.xx.xx)abcde$(yyy.yyy.yyy)fghijk$(zzz.zz.zz.zzz)
I want to be able to pull out each subset of strings matching $(anything inside here), so for the example above I would like to get 3 substrings.
the characters in between the brackets do not necessarily always match the same pattern.
I have tried using the following regex
(\$\([a-z]+.*\))
but this matches whole string, due to the fact it starts with '$', anything in middle, and ends with ')'
Hopefully this makes sense.
I should also note that I have very limited experience using regex.
Thanks

(\$\([a-z]+.*?\))
Use ? to make your search non greedy.* is greedy and consumes the max it can.adding ? to * makes it non greedy and it will stop at the first instance of ).
See demo.
http://regex101.com/r/sU3fA2/28

try the below
\((.*?)\)\g
for the given string $(xx.xx.xx)abcde$(yyy.yyy.yyy)fghijk$(zzz.zz.zz.zzz) it returns the three substring..
MATCH 1
1. [2-10] `xx.xx.xx`
MATCH 2
1. [18-29] `yyy.yyy.yyy`
MATCH 3
1. [38-51] `zzz.zz.zz.zzz`
http://regex101.com/r/bX7qR2/1

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression of partial url - c#

Related

Variable-length lookbehind for backslashes

Split credit card number into 4 chunks using Regex lookahead?

Don't use capturing groups in c# Regex

Regular expression for specific combination of alphabets and numbers

What is the correct RegEx to extract my substring

Categories

Resources