Regex - how do i match the first part of an indexed path - c#

for the line
Tester[0]/Test[4]/testId
Tester[0]/Test[4]/testId
Test[1]/Test[4]/testId
Test[2]/Test[4]/testId
I want to match the first part of the path including the first [, n and ] and first /
so for line above I would get
Tester[0]
Tester[0]
Test[1]
Test[2]
I have tried using
var rx = new Regex(#"^\[.*\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);
i get
res == "testId";
rather than
res == "Test[4]/testId"
which is what im hoping for
so its matching the first open square bracket and the last closing bracket.
I need it to match only the first closing bracket
Update:
Sorry, i am trying to match the first forward slash also.
Tester[0]/
Tester[0]/
Test[1]/
Test[2]/
Solution:
to remove the first match using "?":
var rx = new Regex(#"^.*?\[.*?\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);

I'm assuming this was your original regex pattern: ^.*\[.*\]/ (the pattern in your question does not match the lines).
This pattern uses greedy quantifiers (*), so, even though we only requested one match, the pattern itself matches more than we'd like. As you noticed, it matched until the second occurrence of the square brackets.
We can make this pattern non-greedy by adding question marks to the quantifiers: ^.*?\[.*?\]/.
Although this works for your use-case, a better pattern may be: ^[^/]+/. This removes any character up to the first forward-slash. The [^ ... ] is a negative character class (the brackets are unrelated to the brackets in the strings we're matching against). In this case, it matches any character that isn't a forward-slash.
For this simple text manipulation, though, we could just use a String.Substring() instead of regular expressions:
line.Substring(line.IndexOf('/') + 1);
This is faster and easier to understand than a regular expression pattern.

You can use lookahead and lookbehind approach to find matching and replace accordingly :
With lookaround approach, your regex would be like this :
(?=/).*(?<=])

Is this the sort of thing you are looking for?
updated
var str="Tester[0]/Test[4]/testId\nTester[0]/Test[4]/testId\nTest[1]/Test[4]/testId\nTest[2]/Test[4]/testId"
console.log(str)
// Tester[0]/Test[4]/testId
// Tester[0]/Test[4]/testId
// Test[1]/Test[4]/testId
// Test[2]/Test[4]/testId
var str2=str.replace(/\/.+/mg,"")
console.log(str2)
// Tester[0]
// Tester[0]
// Test[1]
// Test[2]
this works by starting the match at the first '/' and then ending when the line ends and replaces this match with " ". the m flags multi-line and the g flags to do a global match.

Related

Split credit card number into 4 chunks using Regex lookahead?

I want to chunk a credit card number (in my case I always have 16 digits) into 4 chunks of 4 digits.
I've succeeded doing it via positive look ahead :
var s="4581458245834584";
var t=Regex.Split(s,"(?=(?:....)*$)");
Console.WriteLine(t);
But I don't understand why the result is with two padded empty cells:
I already know that I can use "Remove Empty Entries" flag , But I'm not after that.
However - If I change the regex to (?=(?:....)+$) , then I get this result :
Question
Why does the regex emit empty cells ? and how can I fix my regex so it produce 4 chunks at first place ( without having to 'trim' those empty entries )
But I don't understand why the result is with two padded empty cells:
Let's try breaking down your regex.
Regex: (?=(?:....)*$)
Explanation: Lookahead (?=) for anything 4 times(?:....) for zero or more times. Just looking ahead and matching nothing will match zero width.
Since you are using * quantifier which says zero or more it matches first zero width at beginning or string and also at end of string.
Visualize it from this snapshot of Regex101 Demo
[
So How can I select only those 3 splitters in the middle ?
I don't know C# very well but this 3 step method might work for you.
Search with (\d{4}) and replace with -\1. Result will be -4581-4582-4583-4584. Demo
Now replace first - by searching with ^-. Result will be 4581-4582-4583-4584. Demo
At last search for - and split on it. Demo. Used \n to substitute for demo purpose.
Alternative Solution Inspired from Royi's answer.
Regex: (?=(?!^)(?:\d{4})+$)
Explanation:
(?= // Look ahead for
(?!^) // Not the start of string
(?:\d{4})+$ // Multiple group of 4 digits till end of string
)
Since nothing is matched and only lookaround assertions are used, it will pinpoint Zero width after a group of 4 digits.
Regex101 Demo
It seems like I've found an answer.
Looking at those splitters - I needed to get rid of the edges :
So I thought - how can I tell the regex engine "not at the start of the line " ?
Which is exactly what (?!^) does
So here is the new regex :
var s="4581458245834584";
var t=Regex.Split(s,"(?!^)(?=(?:....)+$)");
Console.WriteLine(t);
Result :
Umm, I don't know WHY you need Regex for this. You just overcomplicate things. Better way is to just split it manually:
var values = new List<int>();
for(int i =0;i < 4;i++)
{
var value = int.Parse(s.Substring(i*4, 4));
values.Add(value);
}
Regex solution:
var s = "4581458245834584";
var separated = Regex.Match(s, "(.{4}){4}").Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToArray();
It has been mentioned already that the * quantifier also matches at the end of string where there are zero group-matches ahead. To avoid matching at start and end you can use \B non word boundary which only matches between two word characters not giving matches for start and end.
\B(?=(?:.{4})+$)
See demo at regex101
Because the lookahead won't be triggered at start or end of the string you could even use *

Why does regex return one digit?

I want to get last digits from strings.
For example: "Text11" - 11; "Te1xt32" - 32 and etc.
I write this regex:
var regex = new Regex(#"^(.+)(?<Number>(\d+))(\z)");
And use it:
regex.Match(input).Groups["Number"].Value;
That returns 1 for "Text11" and 2 for "Te1xt32" instead 11 and 32.
So question, Why \d+ get only last digit?
Because .+ at the first is greedy by default, so .+ matches greedily upto the last and then it backtracks to the previous character and uses the pattern \d+ inorder to produce a match. You need to add a non-greedy quantifier ? next to the + to make the regex engine to do a non-greedy match or shortest possible match.
var regex = new Regex(#"^(.+?)(?<Number>(\d+))(\z)");
DEMO
As an alternative, you can use the same regex with in RightToLeft mode:
var input = "Te1xt32";
// I removed some unnecessary capturing groups in your regex
var regex = new Regex(#"^(.+)(?<Number>\d+)\z", RegexOptions.RightToLeft);
// You need to specify the starting index as the end of the string
Match m = regex.Match(input, input.Length);
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups["Number"].Value);
Demo on ideone
Since what you want to find is at the end of the string and the part in front doesn't have any pattern, going from right to left avoids some backtracking in this case, though the difference, if any, is going to be insignificant in this case.
RightToLeft mode, as the name suggests, performs the match from right to left, so the numbers at the end of the string will be greedily consumed by \d+ before the rest is consumed by .+.
You can simply do:
var regex = new Regex(#"(?<Number>\d+)\z");

Regular expression and removing signs

I'm new in regular expressions. I've got a little problem and i can't find the answer. I'm looking for redundant brackets using this regular espression:
public Regex RedundantBrackets = new Regex("[(](\\s?)[a-z](\\s?)[)]");
When i find something i want to modife string in this way:
text1 (text2) text3 => text1 text2 text3 - so as you can se i want only to remove brackets. How can i do this? I was trying to use Replace method, but using it i can only replace every sign of "(text2)".
Thanks in advance!
Try this replace
Regex.Replace("text1 (text2) text3", // Input
#"([()])", // Pattern to match
string.Empty) // Item to replace
/* result: text1 text2 text3*/
Explanation
Regex replace looks across the whole string for a match. If it finds a match it will replace that item. So our match pattern looks like this ([()]). Which means this
( is what is required within the pattern to start the match and needs a closing ) otherwise the match pattern is not balanced.
[] in the pattern says, I am searching for a character, and [ and ] define a set. They are considered set matches. The most common one is [A-Z] which is any set of characters, starting with A and ending in Z. We will define our own set. *Remember [ and ] mean to regex we are looking for 1 character but we specify a set of many characters within that.
( and ) within our set [()] which also could be specified as [)(] as well means we have a set of two characters. Those two characters are the opening and closing parenthesis ().
So taken all together we are looking to match (1) any character in the set (2) that is either a ( or a ). When that match is found, replace the ( or ) with string.empty.
When we run the regex replace on your text it finds two matches the (text2 and finally the match text2). Those are replaced with string.empty.
First off, it can be handy to use verbatim strings so you don't have to escape the slashes etc.
public Regex RedundantBrackets = new Regex(#"[(]\s?([a-z]+)\s?[)]");
We want to wrap [a-z] in parenthesis because that's what we're trying to capture. We can then use $1 to place that capture into the replacement
RedundantBrackets.Replace("text (text) text", "$1");
EDIT: I forgot to add repetition to [a-z] => [a-z]+
this will remove all charaters using regex
finalString = Regex.Replace(finalString, #"[^\w ]", "");

Regex replace/search using values/variables in search text

What is the regex syntax to use part of a matched expression in the subsequent part of the search?
So, for example, if I have:
"{marker=1}some text{/marker=1}"
or
"{marker=2}some text{/marker=2}"
I want to use the first digit found in the pattern to find the second digit. So in
"{marker=1}{marker=2}some text{/marker=2}{/marker=1}"
the regex would match the 1's and then the 2's.
So far I've come up with {marker=(\d)}(.*?){/marker=(\d)} but don't know how to specify the second \d to refer to the value found in the first \d.
I'm doing this in C#.
try:
{marker=(\d)}(.*?){/marker=(\1)}
Numbered backreference is just \n, so \1 should work here:
Regex re = new Regex(#"\{marker=(\d)\}(.*?)\{/marker=(\1)\}");
// expect to work
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=1}"));
// expect to fail (end marker is different)
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=2}"));

Extending [^,]+, Regular Expression in C#

Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Categories

Resources