Get the matches of a pattern in a string

Get the matches of a pattern in a string - c#

I have the following regular expression:
^[[][A-Za-z_1-9]+[\]]$
I want to be able to get all the matches of this regular expression in a string. The match should be of the form [Whatever]. Inside the braces, there could also be an _ or numeric characters. So I wrote the following code:
private const String REGEX = #"^[[][A-Za-z_1-9]+[\]]$";
static void Main(string[] args)
{
String expression = "([ColumnName] * 500) / ([AnotherColumn] - 50)";
MatchCollection matches = Regex.Matches(expression, REGEX);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
}
But unfortunately, matches is always having a count of zero. Apparently, the regular expression is checking whether the whole String is a match and not getting the matches out of the string. I'm not sure whether the regular expression is wrong or the way I'm using Regex.Matches() is incorrect.
Any thoughts?

You need to remove the start/end of string anchors (^ and $) from your pattern, since the matches you are looking for are not actually at the start and end of the string. You can also just use \[ and \] instead of [[] and [\]]:
private const String REGEX = #"\[[A-Za-z_1-9]+\]";
Should do the trick.

You're anchoring your regex to the beginning and end of the string so of course it won't match anything.
Removing the anchors (^ for beginning and $ for end) works fine:
[[][A-Za-z_1-9]+[\]]
It returns, as you would hopefully expect:
[ColumnName]
[AnotherColumn]

Related

Matching optional slash in regex

I need a regex that matches first two words between three "/" characters in url: eg. in /en/help/test/abc/def it should match /en/help/.
I use this regex: /.*?/(.*?)/ however sometimes I have the url without the last slash like /en/help which does not match because of the missing last slash.
Can you help me to adjust the regex to match only "/en/help" part? Thanks

A simple way to solve it is to replace reluctant (.*?)/ with greedy ([^/]*):
/.*?/([^/]*)
This would stop at the third slash if there is one, or at the end of the string if the final slash is not there.
Note that you could replace .*? with the same [^/]* expression for consistency:
/[^/]*/([^/]*)

If characters will contain alphanumeric, then you can use the following pattern:
static void Main(string[] args)
{
string s1 = "/en/help/test/abc/def";
string s2 = "/en/help ";
string pattern =
#"(?ix) #Options
/ #This will match first slash
\w+ #This will match [a-z0-9]
/ #This will match second slash
\w+ #Finally, this again will match [a-z0-9] until 3-rd slash (or end)";
foreach(string s in new[] { s1, s2})
{
var match = Regex.Match(s, pattern);
if (match.Success) Console.WriteLine($"Found: '{match.Value}'");
}
}

How to replace two first characters before underscore with regex?

I have example this string:
HU_husnummer
HU_Adrs
How can I replace HU? with MI?
So it will be MI_husnummer and MI_Adrs.
I am not very good at regex but I would like to solve it with regex.
EDIT:
The sample code I have now and that still does not work is:
string test = Regex.Replace("[HU_husnummer] int NOT NULL","^HU","MI");

Judging by your comments, you actually need
string test = Regex.Replace("[HU_husnummer] int NOT NULL",#"^\[HU","[MI");
Have a look at the demo
In case your input string really starts with HU, remove the \[ from the regex pattern.
The regex is #"^\[HU" (note the verbatim string literal notation used for regex pattern):
^ - matches the start of string
\[ - matches a literal [ (since it is a special regex metacharacter denoting a beginning of a character class)
HU - matches HU literally.

String varString="HU_husnummer ";
varString=varString.Replace("HU_","MI_");
Links
https://msdn.microsoft.com/en-us/library/system.string.replace(v=vs.110).aspx
http://www.dotnetperls.com/replace

using Substring
var abc = "HU_husnummer";
var result = "MI" + abc.Substring(2);
Replace in Regex.
string result = Regex.Replace(abc, "^HU", "MI");

Regex to extract Variable Part

I have a string containing this: #[User::RootPath]+"Dim_MyPackage10.dtsx" and I need to extract the [User::RootPath] part using a regex. So far I have this regex: [a-zA-Z0-9]*\.dtsx but I don't know how to proceed further.

For the variable, why not consume what is needed by using the not set [^ ] to extract everything except in the set?
The ^ in the braces means find what is not matched, such as this where it seeks all that is not a ] or a quote (").
Then we can place the actual matches in named capture groups (?<{NameHere}> ) and extract accordingly
string pattern = #"(?:#\[)(?<Path>[^\]]+)(?:\]\+\"")(?<File>[^\""]+)(?:"")";
// Pattern is (?:#\[)(?<Path>[^\]]+)(?:\]\+\")(?<File>[^\"]+)(?:")
// w/o the "'s escapes for the C# parser
string text = #"#[User::RootPath]+""Dim_MyPackage10.dtsx""";
var result = Regex.Match(text, pattern);
Console.WriteLine ("Path: {0}{1}File: {2}",
result.Groups["Path"].Value,
Environment.NewLine,
result.Groups["File"].Value
);
/* Outputs
Path: User::RootPath
File: Dim_MyPackage10.dtsx
*/
(?: ) is match but don't capture, because we use those as defacto anchors for our pattern and to not place them into the match capture groups.

Use this regex pattern:
\[[^[\]]*\]
Check this demo.

Your regex will match any number of alphanumeric characters, followed by .dtsx. In your example, it would match MyPackage10.dtsx.
If you want to match Dim_MyPackage10.dtsx you need to add an underscore to your list of allowed characters in the regex: [a-zA-Z0-9]*.dtsx
If you want to match the [User::RootPath], you need a regex that will stop at the last / (or \, depends on which type of slashes you use in the paths): something like this: .*\/ (or .*\\)

From the answers and comments - and the fact that none has been 'accepted' so far - it appears to me that the question/problem is not completely clear. If you're looking for the pattern [User::SomeVariable] where only 'SomeVariable' is, well, variable, then you may try:
\[User::\w+]
to capture the full expression.
Furthermore, if you wish to detect that pattern, but then need only the "SomeVariable" part, you may try:
(?<=\[User::)\w+(?=])
which uses look-arounds.

Here it is bro
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring = "source string to match with pattern";
Regex re = new Regex(#"\[\S+\]");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}

C# Regular Expressions

I have a string that has multiple regular expression groups, and some parts of the string that aren't in the groups. I need to replace a character, in this case ^ only within the groups, but not in the parts of the string that aren't in a regex group.
Here's the input string:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
Here's what the output string should look like:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEMEENDREPLACEME~STARTREPLACEMEBLAHENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
I need to do it using C# and can use regular expressions.
I can match the string into groups of those that should and shouldn't be replaced, but am struggling on how to return the final output string.

I'm not sure I get exactly what you're having trouble with, but it didn't take long to come up with this result:
string strRegex = #"STARTREPLACEME(.+)ENDREPLACEME";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~";
string strReplace = "STARTREPLACEMEENDREPLACEME";
return myRegex.Replace(strTargetString, strReplace);
By using my favorite online Regex tool: http://regexhero.net/tester/
Is that helpful?

Regex rgx = new Regex(
#"\^(?=(?>(?:(?!(?:START|END)(?:DONT)?REPLACEME).)*)ENDREPLACEME)");
string s1 = rgx.Replace(s0, String.Empty);
Explanation: Each time a ^ is found, the lookahead scans ahead for an ending delimiter (ENDREPLACEME). If it finds one without seeing any of the other delimiters first, the match must have occurred inside a REPLACEME group. If the lookahead reports failure, it indicates that the ^ was found either between groups or within a DONTREPLACEME group.
Because lookaheads are zero-width assertions, only the ^ will actually be consumed in the event of a successful match.
Be aware that this will only work if delimiters are always properly balanced and groups are never nested within other groups.

If you are able to separate into groups that should be replaced and those that shouldn't, then instead of providing a single replacement string, you should be able to use a MatchEvaluator (a delegate that takes a Match and returns a string) to make the decision of which case it is currently dealing with and return the replacement string for that group alone.
You may also use an additional regex inside the MatchEvaluator. This solution produces the expected output:
Regex outer = new Regex(#"STARTREPLACEME.+ENDREPLACEME", RegexOptions.Compiled);
Regex inner = new Regex(#"\^", RegexOptions.Compiled);
string replaced = outer.Replace(start, m =>
{
return inner.Replace(m.Value, String.Empty);
});

Regexp skip pattern

Problem
I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.
Example
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "Hel[*o], w*rld!";
var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")
Assert.AreEqual("Hel[*o], w%rld!", output));
}

Try using a look ahead:
\*(?![^\[\]]*\])
Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:
string text = #"h*H\[el[*o], w*rl\]d!";
string pattern = #"
\\. # Match an escaped character. (to skip over it)
|
\[ # Match a character class
(?:\\.|[^\]])* # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*) # Match `*` and add it to a group.
";
text = Regex.Replace(text, pattern,
match => match.Groups["Asterisk"].Success ? "%" : match.Value,
RegexOptions.IgnorePatternWhitespace);
If you don't care about escaped characters you can simplify it to:
\[ # Skip a character class
[^\]]* # until the first ']'
\]
|
(?<Asterisk>\*)
Which can be written without comments as: #"\[[^\]]*\]|(?<Asterisk>\*)".
To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.

I couldn't come up with a pure RegEx solution. Therefore I am providing you with a pragmatic solution. I tested it and it works:
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "H*]e*l[*o], w*rl[*d*o] [o*] [o*o].";
var actual = ReplaceAsterisksNotInSquareBrackets(input);
var expected = "H%]e%l[*o], w%rl[*d*o] [o*] [o*o].";
Assert.AreEqual(expected, actual);
}
private static string ReplaceAsterisksNotInSquareBrackets(string s)
{
Regex rx = new Regex(#"(?<=\[[^\[\]]*)(?<asterisk>\*)(?=[^\[\]]*\])");
var matches = rx.Matches(s);
s = s.Replace('*', '%');
foreach (Match match in matches)
{
s = s.Remove(match.Groups["asterisk"].Index, 1);
s = s.Insert(match.Groups["asterisk"].Index, "*");
}
return s;
}

EDITED
Okay here is my final attempt ;)
Using negative lookbehind (?<!) and negative lookahead (?!).
var output = Regex.Replace(input, #"(?<!\[)\*(?!\])", "%");
This also passes the test in the comment to another answer "Hel*o], w*rld!"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get the matches of a pattern in a string - c#

You need to remove the start/end of string anchors (^ and $) from your pattern, since the matches you are looking for are not actually at the start and end of the string. You can also just use \[ and \] instead of [[] and [\]]: private const String REGEX = #"\[[A-Za-z_1-9]+\]"; Should do the trick.

You're anchoring your regex to the beginning and end of the string so of course it won't match anything. Removing the anchors (^ for beginning and $ for end) works fine: [[][A-Za-z_1-9]+[\]] It returns, as you would hopefully expect: [ColumnName] [AnotherColumn]

Related

Matching optional slash in regex

How to replace two first characters before underscore with regex?

Regex to extract Variable Part

C# Regular Expressions

Regexp skip pattern

Categories

Resources