Struggling to match exact regex to my strings - c#

I have these strings
string1 = CD.TR.DRC/TF8
string2 = CD.TR.DRC/TF8/A8
string3 = CD.TR.DRC/TF8.PB
string4 = DRC/TF8
string5 = DDRC/TF8
I am trying to match to the regex DRC/TF8 exactly. So what I want is only string1, string3 and string4 to return true. Could someone please suggest how I could get obtain that using regex?

I would say this will work:
\bDRC\/TF8(?=\.|$)
\b binds the whole word
(?=\.|$) is negative lookahead which asserts that the word is terminated with a . or it's the end of the line
See example: https://regexr.com/634a3
Detailed syntax for C# can be found in this post.

Based on your current examples you can use this pattern: (?<=\.|^)DRC\/TF8(?=\.|$)
code:
using System;
using System.Text.RegularExpressions;
public class Test{
public static void Main(){
string pattern = #"(?<=\.|^)DRC\/TF8(?=\.|$)";
Regex re = new Regex(pattern);
string[] text = {"CD.TR.DRC/TF8", "CD.TR.DRC/TF8/A8", "CD.TR.DRC/TF8.PB", "DRC/TF8", "DDRC/TF8"};
foreach(string str in text){
if (re.IsMatch(str)){
Console.WriteLine(str);
}
}
}
}
output:
CD.TR.DRC/TF8
CD.TR.DRC/TF8.PB
DRC/TF8

Related

Multiple strings have special characters in regex

I am new to Regular expression, I have a requirement to find "/./" or
"/../" in a string. My program look likes as follow,
String Path1 = "https://18.56.199.56/Directory1/././Directory2/filename.txt";
String Path2 = https://18.56.199.56/Directory1/../../Directory2/filename.txt";
String Path3 = "https://18.56.199.56/Directory1/Directory2/filename.txt";
Regex nameRegex = new Regex(#"[/./]+[/../]");
bool b = nameRegex.IsMatch(OrginalURL);
This code giving true for Path3(dont have any "." or ".." strings) also.
It seems the expression "Regex nameRegex = new Regex(#"[/./]+[/../]");" is not true. Kindly correct this expression.
Regex match should be success for Path1 or Path2 and not Path3.
Your [/./]+[/../] (=[/.]+[/.]) regex matches 1+ / or . chars followed with a / or .. It can thus match ....../, /////////////, and certainly // in the protocol part.
If you do not have to use a regex you may simply use .Contains:
if (s.Contains("/../") || s.Contains("/./")) { ... }
See this C# demo.
You may use the following regex, too:
bool b = Regex.IsMatch(OrginalURL, #"/\.{1,2}/");
See this regex demo and the regex graph:
Details
/ - a / char
\.{1,2} - 1 or 2 dots
/ - a / char.
While this would not be the best way to do this task, an expression similar to:
\/\.{1,2}(?=\/)
might work.
Demo
Escaping is just for demoing purpose, you can remove those.
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\/\.{1,2}(?=\/)";
string input = #"https://18.56.199.56/Directory1/./Directory2/filename.txt
https://18.56.199.56/Directory1/././Directory2/filename.txt
https://18.56.199.56/Directory1/../../../Directory2/filename.txt
https://18.56.199.56/Directory1/./../.../Directory2/filename.txt
https://18.56.199.56/Directory1/Directory2/filename.txt";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}

RegEx for capturing a word in between = and ;

I want to select word2 from the following :
word2;word3
word2 that is between ; and start of the line unless there is a = in between. In that case, I want start from the = instead of the start of the line
like word2 from
word1=word2;word3
I have tried using this regex
(?<=\=|^).*?(?=;)
which select the word2 from
word2;word3
but also the whole word1=word2 from
word1=word2;word3
You can use an optional group to check for a word followed by an equals sign and capture the value in the first capturing group:
^(?:\w+=)?(\w+);
Explanation
^ Start of string
(?:\w+=)? Optional non capturing group matching 1+ word chars followed by =
(\w+) Capture in the first capturing group 1+ word chars
; Match ;
See a regex demo
In .NET you might also use:
(?<=^(?:\w+=)?)\w+(?=;)
Regex demo | C# demo
There should be so many options, maybe regular expressions among the last ones.
But, if we wish to use an expression for this problem, let's start with a simple one and explore other options, maybe something similar to:
(.+=)?(.+?);
or
(.+=)?(.+?)(?:;.+)
where the second capturing group has our desired word2.
Demo 1
Demo 2
Example 1
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(.+=)?(.+?);";
string input = #"word1=word2;word3
word2;word3";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Example 2
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(.+=)?(.+?)(?:;.+)";
string substitution = #"$2";
string input = #"word1=word2;word3
word2;word3";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Instead of using regular expresions you can solve the problem with String class methods.
string[] words = str.Split(';');
string word2 = words[0].Substring(words[0].IndexOf('=') + 1);
First line splits the line from ';'. Assuming you just have a single ';' this statement splits your line into two strings. And second line returns a substring of first part (words[0]) starting from the first occurence of '=' (words[0].IndexOf('=')) character's next characher (+1) to the end. If your line doesn't have any '=' characters it just starts from the beginning because IndexOf returns -1.
Related documentation:
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.string.substring?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.string.indexof?view=netframework-4.8

How to use (?!...) regex pattern to skip the whole unmatched part?

I would like to use the ((?!(SEPARATOR)).)* regex pattern for splitting a string.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var separator = "__";
var pattern = String.Format("((?!{0}).)*", separator);
var regex = new Regex(pattern);
foreach (var item in regex.Matches("first__second"))
Console.WriteLine(item);
}
}
It works fine when a SEPARATOR is a single character, but when it is longer then 1 character I get an unexpected result. In the code above the second matched string is "_second" instead of "second". How shall I modify my pattern to skip the whole unmatched separator?
My real problem is to split lines where I should skip line separators inside quotes. My line separator is not a predefined value and it can be for example "\r\n".
You can do something like this:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "plum--pear";
string pattern = "-"; // Split on hyphens
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
}
}
// The method displays the following output:
// 'plum'
// ''
// 'pear'
The .NET regex does not does not support matching a piece of text other than a specific multicharacter string. In PCRE, you would use (*SKIP)(*FAIL) verbs, but they are not supported in the native .NET regex library. Surely, you might want to use PCRE.NET, but .NET regex can usually handle those scenarios well with Regex.Split
If you need to, say, match all but [anything here], you could use
var res = Regex.Split(s, #"\[[^][]*]").Where(m => !string.IsNullOrEmpty(m));
If the separator is a simple literal fixed string like __, just use String.Split.
As for your real problem, it seems all you need is
var res = Regex.Matches(s, "(?:\"[^\"]*\"|[^\r\n\"])+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
See the regex demo
It matches 1+ (due to the final +) occurrences of ", 0+ chars other than " and then " (the "[^"]*" branch) or (|) any char but CR, LF or/and " (see [^\r\n"]).

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"
string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}
Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

How can C# Regex capture everything between *| and |*?

In C#, I need to capture variablename in the phrase *|variablename|*.
I've got this RegEx: Regex regex = new Regex(#"\*\|(.*)\|\*");
Online regex testers return "variablename", but in C# code, it returns *|variablename|*, or the string including the star and bar characters. Anyone know why I'm experiencing this return value?
Thanks much!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegExTester
{
class Program
{
static void Main(string[] args)
{
String teststring = "This is a *|variablename|*";
Regex regex = new Regex(#"\*\|(.*)\|\*");
Match match = regex.Match(teststring);
Console.WriteLine(match.Value);
Console.Read();
}
}
}
//Outputs *|variablename|*, instead of variablename
match.Value contains the entire match. This includes the delimiters since you specified them in your regex. When I test your regex and input with RegexPal, it highlights *|variablename|*.
You want to get only the capture group (the stuff in the brackets), so use match.Groups[1]:
String teststring = "This is a *|variablename|*";
Regex regex = new Regex(#"\*\|(.*)\|\*");
Match match = regex.Match(teststring);
Console.WriteLine(match.Groups[1]);

Categories

Resources