Regex to extract Variable Part

Regex to extract Variable Part - c#

I have a string containing this: #[User::RootPath]+"Dim_MyPackage10.dtsx" and I need to extract the [User::RootPath] part using a regex. So far I have this regex: [a-zA-Z0-9]*\.dtsx but I don't know how to proceed further.

For the variable, why not consume what is needed by using the not set [^ ] to extract everything except in the set?
The ^ in the braces means find what is not matched, such as this where it seeks all that is not a ] or a quote (").
Then we can place the actual matches in named capture groups (?<{NameHere}> ) and extract accordingly
string pattern = #"(?:#\[)(?<Path>[^\]]+)(?:\]\+\"")(?<File>[^\""]+)(?:"")";
// Pattern is (?:#\[)(?<Path>[^\]]+)(?:\]\+\")(?<File>[^\"]+)(?:")
// w/o the "'s escapes for the C# parser
string text = #"#[User::RootPath]+""Dim_MyPackage10.dtsx""";
var result = Regex.Match(text, pattern);
Console.WriteLine ("Path: {0}{1}File: {2}",
result.Groups["Path"].Value,
Environment.NewLine,
result.Groups["File"].Value
);
/* Outputs
Path: User::RootPath
File: Dim_MyPackage10.dtsx
*/
(?: ) is match but don't capture, because we use those as defacto anchors for our pattern and to not place them into the match capture groups.

Use this regex pattern:
\[[^[\]]*\]
Check this demo.

Your regex will match any number of alphanumeric characters, followed by .dtsx. In your example, it would match MyPackage10.dtsx.
If you want to match Dim_MyPackage10.dtsx you need to add an underscore to your list of allowed characters in the regex: [a-zA-Z0-9]*.dtsx
If you want to match the [User::RootPath], you need a regex that will stop at the last / (or \, depends on which type of slashes you use in the paths): something like this: .*\/ (or .*\\)

From the answers and comments - and the fact that none has been 'accepted' so far - it appears to me that the question/problem is not completely clear. If you're looking for the pattern [User::SomeVariable] where only 'SomeVariable' is, well, variable, then you may try:
\[User::\w+]
to capture the full expression.
Furthermore, if you wish to detect that pattern, but then need only the "SomeVariable" part, you may try:
(?<=\[User::)\w+(?=])
which uses look-arounds.

Here it is bro
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring = "source string to match with pattern";
Regex re = new Regex(#"\[\S+\]");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}

Related

How do I regex match each individual word within backticks?

I am trying to get results for each individual word within backticks. For example,
if I have something like this text
some description `match these_words th_is_wor` or `THIS_WOR thi_sqw` a `word_snake`
I want the search results to be:
match
these_words
th_is_wor
THIS_WOR
thi_sqw
word_snake
I'm essentially trying to get each "word", word being one or more english letter or underscore characters, between each set of backticks.
I currently have the following regex that seems to match ALL the text between each set of backticks:
/(?<=`)(\b([^`\]|\w|_)*\b)(?=`)/gi
This uses a positive lookbehind to find text that comes after a ` character: (?<=`)
Followed by a capture group for one or more things such that the thing is not a `, not a \, is a word character, or is an _ character within word boundaries: (\b([^`\]|\w|_)*\b)
Followed by a positive lookahead for another ` character to ensure we're enclosed within backticks.
This sort of works, but captures ALL the text between backticks instead of each individual word. This would require further processing which I'd like to avoid. My regex results right now are:
match these_words th_is_wor
THIS_WOR thi_sqw
word_snake
If there is a generic formula for getting each individual word within backticks or within quotes, that would be fantastic. Thank you!
Note: Much appreciated if the answer could be formatted in C#, but not required, as I can do that bit myself if needed.
Edit: Thank you Mr. إين from Ben Awad's Discord server for the quickest response! This is the solution as proposed by him. Also thank you to everyone who responded to my post, you guys are all AWESOME!
using System;
using System.Text.RegularExpressions;
class Program {
static void Main(string[] args) {
string backtickSentence = "i want to `match these_words th_is_wor` or `THIS_WOR thi_sqw` a `word_snake`";
string backtickPattern = #"(?<=^[^`]*(?:`[^`]*`[^`]*)*`(?:[^`]* )*)\w+";
string quoteSentence = "some other \"words in a \" sentence be \"gettin me tripped_up AllUp inHere\"";
string quotePattern = "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*\"(?:[^\"]* )*)\\w+";
// Call Matches method without specifying any options.
try {
foreach (Match match in Regex.Matches(backtickSentence, backtickPattern, RegexOptions.None, TimeSpan.FromSeconds(1)))
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
Console.WriteLine();
foreach (Match match in Regex.Matches(quoteSentence, quotePattern, RegexOptions.None, TimeSpan.FromSeconds(1)))
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
}
catch (RegexMatchTimeoutException) {} // Do Nothing: Assume that timeout represents no match.
Console.WriteLine();
// Call Matches method for case-insensitive matching.
try {
foreach (Match match in Regex.Matches(backtickSentence, backtickPattern, RegexOptions.IgnoreCase))
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
Console.WriteLine();
foreach (Match match in Regex.Matches(quoteSentence, quotePattern, RegexOptions.IgnoreCase))
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
}
catch (RegexMatchTimeoutException) {}
}
}
His explanation for this was as follows, but you can paste his regex into regexr.com for more info
var NOT_BACKTICK = #"[^`]*";
var WORD = #"(\w+)";
var START = $#"^{NOT_BACKTICK}"; // match anything before the first backtick
var INSIDE_BACKTICKS = $#"`{NOT_BACKTICK}`"; // match a pair of backticks
var ODD_NUM_BACKTICKS_BEFORE = $#"{START}({INSIDE_BACKTICKS}{NOT_BACKTICK})*`"; // match anything before the first backtick, then any amount of paired backticks with anything afterwards, then a single opening backtick
var CONDITION = $#"(?<={ODD_NUM_BACKTICKS_BEFORE})";
var CONDITION_TRUE = $#"(?: *{WORD})"; // match any spaces then a word
var CONDITION_FALSE = $#"(?:(?<={ODD_NUM_BACKTICKS_BEFORE}{NOT_BACKTICK} ){WORD})"; // match up to an opening backtick, then anything up to a space before the current word
// uses conditional matching
// see https://learn.microsoft.com/en-us/dotnet/standard/base-types/alternation-constructs-in-regular-expressions#Conditional_Expr
var pattern = $#"(?{CONDITION}{CONDITION_TRUE}|{CONDITION_FALSE})";
// refined backtick pattern
string backtickPattern = #"(?<=^[^`]*(?:`[^`]*`[^`]*)*`(?:[^`]* )*)\w+";

With C# you can use the Group.Captures Property and then get the capture group values.
Note that \w also matches _
`(?:[\p{Zs}\t]*(\w+)[\p{Zs}\t]*)+`
Explanation
<code> Match literally
(?: Non capture group to repeat as a whole part
[\p{Zs}\t]* Match optional spaces
(\w+) Capture group 1, match 1+ word characters
[\p{Zs}\t]* Match optional spaces
)+ Close the non capture group and repeat as least 1 or more times
<code> Match literally
See a .NET regex demo and a C# demo.
For example:
string s = #"some description ` match these_words th_is_wor ` or `THIS_WOR thi_sqw` a `word_snake`";
string pattern = #"`(?:[\p{Zs}\t]*(\w+)[\p{Zs}\t]*)+`";
foreach (Match m in Regex.Matches(s, pattern))
{
string[] result = m.Groups[1].Captures.Select(c => c.Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
Output
match,these_words,th_is_wor
THIS_WOR,thi_sqw
word_snake

Multiple strings have special characters in regex

I am new to Regular expression, I have a requirement to find "/./" or
"/../" in a string. My program look likes as follow,
String Path1 = "https://18.56.199.56/Directory1/././Directory2/filename.txt";
String Path2 = https://18.56.199.56/Directory1/../../Directory2/filename.txt";
String Path3 = "https://18.56.199.56/Directory1/Directory2/filename.txt";
Regex nameRegex = new Regex(#"[/./]+[/../]");
bool b = nameRegex.IsMatch(OrginalURL);
This code giving true for Path3(dont have any "." or ".." strings) also.
It seems the expression "Regex nameRegex = new Regex(#"[/./]+[/../]");" is not true. Kindly correct this expression.
Regex match should be success for Path1 or Path2 and not Path3.

Your [/./]+[/../] (=[/.]+[/.]) regex matches 1+ / or . chars followed with a / or .. It can thus match ....../, /////////////, and certainly // in the protocol part.
If you do not have to use a regex you may simply use .Contains:
if (s.Contains("/../") || s.Contains("/./")) { ... }
See this C# demo.
You may use the following regex, too:
bool b = Regex.IsMatch(OrginalURL, #"/\.{1,2}/");
See this regex demo and the regex graph:
Details
/ - a / char
\.{1,2} - 1 or 2 dots
/ - a / char.

While this would not be the best way to do this task, an expression similar to:
\/\.{1,2}(?=\/)
might work.
Demo
Escaping is just for demoing purpose, you can remove those.
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\/\.{1,2}(?=\/)";
string input = #"https://18.56.199.56/Directory1/./Directory2/filename.txt
https://18.56.199.56/Directory1/././Directory2/filename.txt
https://18.56.199.56/Directory1/../../../Directory2/filename.txt
https://18.56.199.56/Directory1/./../.../Directory2/filename.txt
https://18.56.199.56/Directory1/Directory2/filename.txt";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}

Simple regex-matching

I have a String
String test = #"Lists/Versions/2_.000";
I'm a bit confused on how to use regex to do this.
I'm using the pattern
String pattern = #"\D+";
The msdn page for regular expression says \D is "Matches any character other than a decimal digit"
So shouldn't it be returning 'Lists/Versions/' , '2'?
However its returning
'' , '2', '000'
I would like the string to only match the 2(Or any Integer). How would I do that?
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}

The reason your receiving the issue, is because the /D is to capture non digits, so it detects two separate numeric values (2 and 000) because of the _. So that is how it is grabbing the data. So you have a couple of choices:
Break the string into manageable portions, then anchor to the array.
Build a better pattern to separate.
So the question will be, what are you trying to parse? 2.00 ? Or are you trying to separate numeric numbers in your string?
I'm assuming you have a typo also:
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\w Matches any word character including underscore. Equivalent to
"[A-Za-z0-9_]".
\W Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
You should be able to use:
You should simply do the following:
string url = #"Lists/Versions/2_.000";
var data = Regex.Split(url, #"\D+");
Console.WriteLine(#"Value: {0} and Secondary Value: {1}", data[0], data[1]);
That should find all integer values, so it should provide an output of:
2
000
Which should return as a normal string []. My syntax or expression may be off, but you can find a nice cheat sheet for Regular Expressions here. You'll also want to ensure you check the bounds of the array.

https://dotnetfiddle.net/BU6gp2
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
Console.WriteLine("'{0}'", substrings[1]);
}
}

Please try the following:
// using System.Linq;
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
string[] substrings = Regex.Matches(url, pattern)
.Cast<Match>()
.Select(_ => _.Value)
.ToArray();
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
Alternatively, if you don't need an array.
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
Console.WriteLine("'{0}'", Regex.Match(url, pattern).Value);

How do I match a regex pattern and extract data from it

I can have 0 or many substrings within a text area in the format {key-value}Some text{/key},
For example This is my {link-123}test{/link} text area
I'd like to iterate through any items that match this pattern, perform and action based on the key and value, then replace this substring with a new string (a anchor link that is retreived by the action based on the key).
How would I achieve this in C#?

If these tags are not nested, then you only need to iterate once over the file; if nesting is possible, then you need to do one iteration for each level of nesting.
This answer assumes that braces only occur as tag delimiters (and not, for example, inside comments):
result = Regex.Replace(subject,
#"\{ # opening brace
(?<key>\w+) # Match the key (alnum), capture into the group 'key'
- # dash
(?<value>\w+) # Match the value (alnum), capture it as above
\} # closing brace
(?<content> # Match and capture into the group 'content':
(?: # Match...
(?!\{/?\k<key>) # (unless there's an opening or closing tag
. # of the same name right here) any character
)* # any number of times
) # End of capturing group
\{/\k<key>\} # Match the closing tag.",
new MatchEvaluator(ComputeReplacement), RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
public String ComputeReplacement(Match m) {
// You can vary the replacement text for each match on-the-fly
// m.Groups["key"].Value will contain the key
// m.Groups["value"].Value will contain the value of the match
// m.Groups["value"].Value will contain the content between the tags
return ""; // change this to return the string you generated here
}

Something like this?
Regex.Replace(text,
"[{](?<key>[^-]+)-(?<value>[^}])[}](?<content>.*?)[{][/]\k<key>[}]",
match => {
var key = match.Groups["key"].Value;
var value= match.Groups["value"].Value;
var content = match.Groups["content"].Value;
return string.format("The content of {0}-{1} is {2}", key, value, content);
});

Use the .net Regular Expression libraries. Here is an example that uses the Matches method:
http://www.dotnetperls.com/regex-matches
For replacing text, consider using a templating engine such as Antlr
http://www.antlr.org/wiki/display/ANTLR3/Antlr+3+CSharp+Target
Here is the example from the Matches Blog
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Input string.
const string value = #"said shed see spear spread super";
// Get a collection of matches.
MatchCollection matches = Regex.Matches(value, #"s\w+d");
// Use foreach loop.
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
Console.WriteLine("Index={0}, Value={1}", capture.Index, capture.Value);
}
}
}
}
For more information on the C# regular expression syntax, you could use this cheat sheet:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet

Regexp skip pattern

Problem
I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.
Example
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "Hel[*o], w*rld!";
var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")
Assert.AreEqual("Hel[*o], w%rld!", output));
}

Try using a look ahead:
\*(?![^\[\]]*\])
Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:
string text = #"h*H\[el[*o], w*rl\]d!";
string pattern = #"
\\. # Match an escaped character. (to skip over it)
|
\[ # Match a character class
(?:\\.|[^\]])* # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*) # Match `*` and add it to a group.
";
text = Regex.Replace(text, pattern,
match => match.Groups["Asterisk"].Success ? "%" : match.Value,
RegexOptions.IgnorePatternWhitespace);
If you don't care about escaped characters you can simplify it to:
\[ # Skip a character class
[^\]]* # until the first ']'
\]
|
(?<Asterisk>\*)
Which can be written without comments as: #"\[[^\]]*\]|(?<Asterisk>\*)".
To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.

I couldn't come up with a pure RegEx solution. Therefore I am providing you with a pragmatic solution. I tested it and it works:
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "H*]e*l[*o], w*rl[*d*o] [o*] [o*o].";
var actual = ReplaceAsterisksNotInSquareBrackets(input);
var expected = "H%]e%l[*o], w%rl[*d*o] [o*] [o*o].";
Assert.AreEqual(expected, actual);
}
private static string ReplaceAsterisksNotInSquareBrackets(string s)
{
Regex rx = new Regex(#"(?<=\[[^\[\]]*)(?<asterisk>\*)(?=[^\[\]]*\])");
var matches = rx.Matches(s);
s = s.Replace('*', '%');
foreach (Match match in matches)
{
s = s.Remove(match.Groups["asterisk"].Index, 1);
s = s.Insert(match.Groups["asterisk"].Index, "*");
}
return s;
}

EDITED
Okay here is my final attempt ;)
Using negative lookbehind (?<!) and negative lookahead (?!).
var output = Regex.Replace(input, #"(?<!\[)\*(?!\])", "%");
This also passes the test in the comment to another answer "Hel*o], w*rld!"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to extract Variable Part - c#

I have a string containing this: #[User::RootPath]+"Dim_MyPackage10.dtsx" and I need to extract the [User::RootPath] part using a regex. So far I have this regex: [a-zA-Z0-9]*\.dtsx but I don't know how to proceed further.

Use this regex pattern: \[[^[\]]*\] Check this demo.

Related

How do I regex match each individual word within backticks?

Multiple strings have special characters in regex

Simple regex-matching

How do I match a regex pattern and extract data from it

Regexp skip pattern

Categories

Resources