What should my regular expression look like if I want to validate that $/Folder1/Folder2/Folder3/File.xml always starts with $ and always ends with xml
"$/Folder1/Folder2/Folder3/File.xml"
Pass
"$/Folder1/Folder2/Folder3/File.xm"
Fail
"$/Folder1/Folder2/Folder3/File.py"
Fail
"A/Folder1/Folder2/Folder3/File.xml"
Fail
Edit... So... The right regular expression is...
"^\$.*xml$"
The the method after the implementation of the regex checker looks like...
public bool ValidateConfigPath(string config)
{
var match = Regex.Match(config, #"^\$.*xml$", RegexOptions.IgnoreCase);
return match.Success;
}
And all my unit tests pass...
[TestMethod]
public void ValidateConfigPath_InCorrect1()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("$/Quantz/Main/CSS Calculator/main.py");
Assert.IsFalse(isValid);
}
[TestMethod]
public void ValidateConfigPath_InCorrect2()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("C:/Quantz/Main/CSS Calculator/main.xml");
Assert.IsFalse(isValid);
}
[TestMethod]
public void ValidateConfigPath_Correct()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("$/Quantz/Main/CSS Calculator/main.xml");
Assert.IsTrue(isValid);
}
If there's not a strict requirement for using regular expressions, I recommend the more straight-forward approach of simply checking the starting and ending characters:
string.startswith("$") and string.endswith("xml")
With the above, the intent is absolutely clear to anyone, including people who don't understand regular expressions.
Have you read a tutorial?
^\$.*xml$
^ is the beginning of the string. \$ is a literal $ character. .* is 0 or more arbitrary characters (in fact, no line breaks, but that does not seem to matter in your input example). xml is really just xml. And $ is the end of the string.
Try this:-
^\$.*xml$
Check this link for details
Related
I am trying to learn some .net6 and c# and I am struggling with regular expressions a lot. More specificaly with Avalonia in Windows if that is relevant.
I am trying to do a small app with 2 textboxes. I write text on one and get the text "filtered" in the other one using a value converter.
I would like to filter math expressions to try to solve them later on. Something simple, kind of a way of writing text math and getting results real time.
I have been trying for several weeks to figure this regular expression on my own with no success whatsoever.
I would like to replace in my string "_Expression{BLABLA}" for "BLABLA". For testing my expressions I have been checking in http://regexstorm.net/ and https://regex101.com/ and according to them my matches should be correct (unless I misunderstood the results). But the results in my little app are extremely odd to me and I finally decided to ask for help.
Here is my code:
private static string? FilterStr(object value)
{
if (value is string str)
{
string pattern = #"\b_Expression{(.+?)\w*}";
Regex rgx = new(pattern);
foreach (Match match in rgx.Matches(str))
{
string aux = "";
aux = match.Value;
aux = Regex.Replace(aux, #"_Expression{", "");
aux = Regex.Replace(aux, #"[\}]", "");
str = Regex.Replace(str, match.Value, aux);
}
return new string(str);
}
return null;
}
Then the results for some sample inputs are:
Input:
Some text
_Expression{x}
_Expression{1}
_Expression{4}
_Expression{4.5} _Expression{4+4}
_Expression{4-4} _Expression{4*x}
_Expression{x/x}
_Expression{x^4}
_Expression{sin(x)}
Output:
Some text
x
1{1}
1{4}
1{4.5} 1{4+4}
1{4-4} 1{4*x}
1{x/x}
1{x^4}
1{sin(x)}
or
Input:
Some text
_Expression{x}
_Expression{4}
_Expression{4.5} _Expression{4+4}
_Expression{4-4} _Expression{4*x}
_Expression{x/x}
_Expression{x^4}
_Expression{sin(x)}
Output:
Some text
x
_Expression{4}
4.5 _Expression{4+4}
4-4 _Expression{4*x}
x/x
_Expression{x^4}
_Expression{sin(x)}
It feels very confusing to me this behaviour. I can't see why "(.+?)" does not work with some of them and it does with others... Or maybe I haven't defined something properly or my Replace is wrong? I can't see it...
Thanks a lot for the time! :)
There are some missing parts in your regular expression, for example it doesn't have the curly braces { and } escaped, since curly braces have a special meaning in a regular expression; they are used as quantifiers.
Use the one below.
For extracting the math expression between the curly braces, it uses a named capturing group with name mathExpression.
_Expression\{(?<mathExpression>.+?)\}
_Expression\{ : start with the fixed text_Expression{
(?<mathExpression> : start a named capturing group with name mathExpression
.+? : take the next characters in a non greedy way
) : end the named capturing group
\} : end with the fixed character }
The below example will output 2 matches
Regex regex = new(#"_Expression\{(?<mathExpression>.+?)\}");
var matches = regex.Matches(#"_Expression{4.5} _Expression{4+4}");
foreach (Match match in matches.Where(o => o.Success))
{
var mathExpression = match.Groups["mathExpression"];
Console.WriteLine(mathExpression);
}
Output
4.5
4+4
I have a problem to find the pattern that solves the problem in onestep.
The string looks like this:
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$Text5$Text6 etc.
What i want to get is: Take up to 4x Text. If there are more than "4xText" take only the last sign.
Example:
Text1$Text2$Text3$Text4$Text5$Text6 -> Text1$Text2$Text3$Text4&56
My current solution is:
First pattern:
^([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?
After this i will do a substitution with the first pattern
New string: Text5$Text6
second pattern is:
([^\$])\b
result: 56
combine both and get the result:
Text1$Text2$Text3$Text4$56
For me it is not clear why i cant easily put the second pattern after the first pattern into one pattern. Is there something like an anchor that tells the engine to start the pattern from here like it would do if is would be the only pattern ?
You might use an alternation with a positive lookbehind and then concatenate the matches.
(?<=^(?:[^$]+\$){0,3})[^$]+\$?|[^$](?=\$|$)
Explanation
(?<= Positive lookbehind, assert what is on the left is
^(?:[^$]+\$){0,3} Match 0-3 times any char except $ followed by an optional $
) Close lookbehind
[^$]+\$? Match 1+ times any char except $, then match an optional $
| Or
[^$] Match any char except $
(?=\$|$) Positive lookahead, assert what is directly to the right is either $ or the end of the string
.NET regex demo | C# demo
Example
string pattern = #"(?<=^(?:[^$]*\$){0,3})[^$]*\$?|[^$](?=\$|$)";
string[] strings = {
"Text1",
"Text1$Text2$Text3",
"Text1$Text2$Text3$Text4$Text5$Text6"
};
Regex regex = new Regex(pattern);
foreach (String s in strings) {
Console.WriteLine(string.Join("", from Match match in regex.Matches(s) select match.Value));
}
Output
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$56
I strongly believe regular expression isn't the way to do that. Mostly because of the readability.
You may consider using simple algorithm like this one to reach your goal:
using System;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var result = "";
for(var i=0; i<parts.Length; i++){
result += (i <= 4 ? parts[i] + "$" : parts[i].Substring(4));
}
Console.WriteLine(result);
}
}
There are also linq alternatives :
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var first4 = parts.Take(4);
var remainings = parts.Skip(4);
var result2 = string.Join("$", first4) + "$" + string.Join("", remainings.Select( r=>r.Substring(4)));
Console.WriteLine(result2);
}
}
It has to be adjusted to the actual needs but the idea is there
Try this code:
var texts = new string[] {"Text1", "Text1$Text2$Text3", "Text1$Text2$Text3$Text4$Text5$Text6" };
var parsed = texts
.Select(s => Regex.Replace(s,
#"(Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)",
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
)).ToArray();
// parsed is now: string[3] { "Text1$", "Text1$Text2$Text3$", "Text1$Text2$Text3$Text4$56" }
Explanation:
solution uses regex pattern: (Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)
(...) - first capturing group
(?:...) - non-capturing group
Text\d{1,3}(?:\$Text\d{1,3} - match Text literally, then match \d{1,3}, which is 1 up to three digits, \$ matches $ literally
Rest is just repetition of it. Basically, first group captures first four pieces, second group captures the rest, if any.
We also use MatchEvaluator here which is delegate type defined as:
public delegate string MatchEvaluator(Match match);
We define such method:
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
We use it to evaluate match, so takee first capturing group and concatenate with second, removing unnecessary text.
It's not clear to me whether your goal can be achieved using exclusively regex. If nothing else, the fact that you want to introduce a new character '&' into the output adds to the challenge, since just plain matching would never be able to accomplish that. Possibly using the Replace() method? I'm not sure that would work though...using only a replacement pattern and not a MatchEvaluator, I don't see a way to recognize but still exclude the "$Text" portion from the fifth instance and later.
But, if you are willing to mix regex with a small amount of post-processing, you can definitely do it:
static readonly Regex regex1 = new Regex(#"(Text\d(?:\$Text\d){0,3})(?:\$Text(\d))*", RegexOptions.Compiled);
static void Main(string[] args)
{
for (int i = 1; i <= 6; i++)
{
string text = string.Join("$", Enumerable.Range(1, i).Select(j => $"Text{j}"));
WriteLine(KeepFour(text));
}
}
private static string KeepFour(string text)
{
Match match = regex1.Match(text);
if (!match.Success)
{
return "[NO MATCH]";
}
StringBuilder result = new StringBuilder();
result.Append(match.Groups[1].Value);
if (match.Groups[2].Captures.Count > 0)
{
result.Append("&");
// Have to iterate (join), because we don't want the whole match,
// just the captured text.
result.Append(JoinCaptures(match.Groups[2]));
}
return result.ToString();
}
private static string JoinCaptures(Group group)
{
return string.Join("", group.Captures.Cast<Capture>().Select(c => c.Value));
}
The above breaks your requirement into three different capture groups in a regex. Then it extracts the captured text, composing the result based on the results.
What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
I want to ask about regular expression in C#.
I have a string. ex : "{Welcome to {stackoverflow}. This is a question C#}"
Any idea about regular expressions to get content between {}. I want to get 2 string are : "Welcome to stackoverflow. This is a question C#" and "stackoverflow".
Thank for advance and sorry about my English.
Hi wouldn't know how to do that with a single regular expression, but it would be easier adding a little recursion:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
static class Program {
static void Main() {
string test = "{Welcome to {stackoverflow}. This is a question C#}";
// get whatever is not a '{' between braces, non greedy
Regex regex = new Regex("{([^{]*?)}", RegexOptions.Compiled);
// the contents found
List<string> contents = new List<string>();
// flag to determine if we found matches
bool matchesFound = false;
// start finding innermost matches, and replace them with their
// content, removing braces
do {
matchesFound = false;
// replace with a MatchEvaluator that adds the content to our
// list.
test = regex.Replace(test, (match) => {
matchesFound = true;
var replacement = match.Groups[1].Value;
contents.Add(replacement);
return replacement;
});
} while (matchesFound);
foreach (var content in contents) {
Console.WriteLine(content);
}
}
}
ive written a little RegEx, but havent tested it, but you can try something like this:
Regex reg = new Regex("{(.*{(.*)}.*)}");
...and build up on it.
Thanks everybody. I have the solution. I use stack instead regular expression. I have push "{" to stack and when I meet "}", i will pop "{" and get index. After I get string from that index to index "}". Thank again.
What's the regular expression to check if a string starts with "mailto" or "ftp" or "joe" or...
Now I am using C# and code like this in a big if with many ors:
String.StartsWith("mailto:")
String.StartsWith("ftp")
It looks like a regex would be better for this. Or is there a C# way I am missing here?
You could use:
^(mailto|ftp|joe)
But to be honest, StartsWith is perfectly fine to here. You could rewrite it as follows:
string[] prefixes = { "http", "mailto", "joe" };
string s = "joe:bloggs";
bool result = prefixes.Any(prefix => s.StartsWith(prefix));
You could also look at the System.Uri class if you are parsing URIs.
The following will match on any string that starts with mailto, ftp or http:
RegEx reg = new RegEx("^(mailto|ftp|http)");
To break it down:
^ matches start of line
(mailto|ftp|http) matches any of the items separated by a |
I would find StartsWith to be more readable in this case.
The StartsWith method will be faster, as there is no overhead of interpreting a regular expression, but here is how you do it:
if (Regex.IsMatch(theString, "^(mailto|ftp|joe):")) ...
The ^ mathes the start of the string. You can put any protocols between the parentheses separated by | characters.
edit:
Another approach that is much faster, is to get the start of the string and use in a switch. The switch sets up a hash table with the strings, so it's faster than comparing all the strings:
int index = theString.IndexOf(':');
if (index != -1) {
switch (theString.Substring(0, index)) {
case "mailto":
case "ftp":
case "joe":
// do something
break;
}
}
For the extension method fans:
public static bool RegexStartsWith(this string str, params string[] patterns)
{
return patterns.Any(pattern =>
Regex.Match(str, "^("+pattern+")").Success);
}
Usage
var answer = str.RegexStartsWith("mailto","ftp","joe");
//or
var answer2 = str.RegexStartsWith("mailto|ftp|joe");
//or
bool startsWithWhiteSpace = " does this start with space or tab?".RegexStartsWith(#"\s");
I really recommend using the String.StartsWith method over the Regex.IsMatch if you only plan to check the beginning of a string.
Firstly, the regular expression in C#
is a language in a language with does
not help understanding and code
maintenance. Regular expression is a
kind of DSL.
Secondly, many developers does not
understand regular expressions: it is
something which is not understandable
for many humans.
Thirdly, the StartsWith method brings
you features to enable culture
dependant comparison which regular
expressions are not aware of.
In your case you should use regular expressions only if you plan implementing more complex string comparison in the future.
You can get the substring before ':' using array slices and method String::IndexOf which returns -1 if search substring does not exist. Then you can compare gotten result with constant and logical patterns (C# 9.0+) to check that strings really start with these defined.
string s = "ftp:custom";
int index = s.IndexOf(':');
bool result = index > 0 && s[..index] is "mailto" or "ftp" or "joe";