Regular expression get string between curly braces - c#

I want to ask about regular expression in C#.
I have a string. ex : "{Welcome to {stackoverflow}. This is a question C#}"
Any idea about regular expressions to get content between {}. I want to get 2 string are : "Welcome to stackoverflow. This is a question C#" and "stackoverflow".
Thank for advance and sorry about my English.

Hi wouldn't know how to do that with a single regular expression, but it would be easier adding a little recursion:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
static class Program {
static void Main() {
string test = "{Welcome to {stackoverflow}. This is a question C#}";
// get whatever is not a '{' between braces, non greedy
Regex regex = new Regex("{([^{]*?)}", RegexOptions.Compiled);
// the contents found
List<string> contents = new List<string>();
// flag to determine if we found matches
bool matchesFound = false;
// start finding innermost matches, and replace them with their
// content, removing braces
do {
matchesFound = false;
// replace with a MatchEvaluator that adds the content to our
// list.
test = regex.Replace(test, (match) => {
matchesFound = true;
var replacement = match.Groups[1].Value;
contents.Add(replacement);
return replacement;
});
} while (matchesFound);
foreach (var content in contents) {
Console.WriteLine(content);
}
}
}

ive written a little RegEx, but havent tested it, but you can try something like this:
Regex reg = new Regex("{(.*{(.*)}.*)}");
...and build up on it.

Thanks everybody. I have the solution. I use stack instead regular expression. I have push "{" to stack and when I meet "}", i will pop "{" and get index. After I get string from that index to index "}". Thank again.

Related

Regular expression issues in .net 6 value converter

I am trying to learn some .net6 and c# and I am struggling with regular expressions a lot. More specificaly with Avalonia in Windows if that is relevant.
I am trying to do a small app with 2 textboxes. I write text on one and get the text "filtered" in the other one using a value converter.
I would like to filter math expressions to try to solve them later on. Something simple, kind of a way of writing text math and getting results real time.
I have been trying for several weeks to figure this regular expression on my own with no success whatsoever.
I would like to replace in my string "_Expression{BLABLA}" for "BLABLA". For testing my expressions I have been checking in http://regexstorm.net/ and https://regex101.com/ and according to them my matches should be correct (unless I misunderstood the results). But the results in my little app are extremely odd to me and I finally decided to ask for help.
Here is my code:
private static string? FilterStr(object value)
{
if (value is string str)
{
string pattern = #"\b_Expression{(.+?)\w*}";
Regex rgx = new(pattern);
foreach (Match match in rgx.Matches(str))
{
string aux = "";
aux = match.Value;
aux = Regex.Replace(aux, #"_Expression{", "");
aux = Regex.Replace(aux, #"[\}]", "");
str = Regex.Replace(str, match.Value, aux);
}
return new string(str);
}
return null;
}
Then the results for some sample inputs are:
Input:
Some text
_Expression{x}
_Expression{1}
_Expression{4}
_Expression{4.5} _Expression{4+4}
_Expression{4-4} _Expression{4*x}
_Expression{x/x}
_Expression{x^4}
_Expression{sin(x)}
Output:
Some text
x
1{1}
1{4}
1{4.5} 1{4+4}
1{4-4} 1{4*x}
1{x/x}
1{x^4}
1{sin(x)}
or
Input:
Some text
_Expression{x}
_Expression{4}
_Expression{4.5} _Expression{4+4}
_Expression{4-4} _Expression{4*x}
_Expression{x/x}
_Expression{x^4}
_Expression{sin(x)}
Output:
Some text
x
_Expression{4}
4.5 _Expression{4+4}
4-4 _Expression{4*x}
x/x
_Expression{x^4}
_Expression{sin(x)}
It feels very confusing to me this behaviour. I can't see why "(.+?)" does not work with some of them and it does with others... Or maybe I haven't defined something properly or my Replace is wrong? I can't see it...
Thanks a lot for the time! :)
There are some missing parts in your regular expression, for example it doesn't have the curly braces { and } escaped, since curly braces have a special meaning in a regular expression; they are used as quantifiers.
Use the one below.
For extracting the math expression between the curly braces, it uses a named capturing group with name mathExpression.
_Expression\{(?<mathExpression>.+?)\}
_Expression\{ : start with the fixed text_Expression{
(?<mathExpression> : start a named capturing group with name mathExpression
.+? : take the next characters in a non greedy way
) : end the named capturing group
\} : end with the fixed character }
The below example will output 2 matches
Regex regex = new(#"_Expression\{(?<mathExpression>.+?)\}");
var matches = regex.Matches(#"_Expression{4.5} _Expression{4+4}");
foreach (Match match in matches.Where(o => o.Success))
{
var mathExpression = match.Groups["mathExpression"];
Console.WriteLine(mathExpression);
}
Output
4.5
4+4

c# regex clarification [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Regular Expression - Get partial string

I have a list of project names that I need some matching on.The list of projects could look something like this:
suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada
etc
If the searched for project is suzu, I'd like to have the following result from the list:
suzu
suzu-domestic
suzu-international
but not anything containing suzuran. I also like to have the following match if the search for project is suzuran
suzuran
suzuran-international
but not anything containing suzu.
In C# code I have something that looks like similar to this:
String searchForProject = "suzu";
String regStr = #"THE_REGEX_GOES_HERE"; // The regStr will be in a config file
List<Project> projects = DataWrapper.GetAllProjects();
Regex regEx = new Regex(String.Format(regStr, searchForProject));
result = new List<Project>();
foreach (Project proj in projects)
{
if (regEx.IsMatch(proj.ProjectName))
{
result.Add(proj);
}
}
The question is, can I have a regexp that will enable me to get match on all exact project names, but not the ones that would get returned by a startWith equivalent?
(Today I have a regStr = #"^({0})#", but this does not satisfy the above scenario since it gives more hits than it should)
I'd appreciate if someone can give me a hint in the right direction. Thanks !
Magnus
All you need is actually
var regStr = #"^{0}\b";
The ^ anchor asserts the position at the beginning of string.
The \b pattern matches a location between a word and a non-word character, the start or end of string. You do not need to match the rest of string with .* since you are using Regex.IsMatch, it is a redundant overhead.
C# test code:
var projects = new List<string>() { "suzu", "suzu-domestic", "suzu-international", "suzuran", "suzuran-international", "scorpion", "scorpion-default", "yada", "yada-yada" };
var searchForProject = "suzu";
var regStr = #"^{0}\b"; // The regStr will be in a config file
var regEx = new Regex(String.Format(regStr, searchForProject));
var result = new List<string>();
foreach (var proj in projects)
{
if (regEx.IsMatch(proj))
{
result.Add(proj);
}
}
The foreach may be replaced with a shorter LINQ:
var result = projects.Where(s => regEx.IsMatch(s)).ToList();
You can use a regex like this:
^suzu\b.*
Working demo
If you want suzuran just use:
^suzuran\b.*
You can use "\b{0}\b.*" if you want the match anywhere in the string (but not in the middle of a word), or "^{0}\b.*" if you only want it at the start.
See a regexstorm sample.
If you want an elegant solution in one line with Linq and without regex, you can check this working solution (Demo on .NETFiddle) :
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public void Main()
{
string input = "suzu";
string s = #"suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada";
foreach (var line in ExtractLines(s, input))
Console.WriteLine(line);
}
// works if "-" is your delimiter.
IEnumerable<string> ExtractLines(string lines, string input)
{
return from line in lines.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) // use to split your string by line
let cleanLine = line.Contains("-") ? line.Split('-')[0] : line // use only the needed part
where cleanLine.Equals(input) // check if the output match with the input
select line; // return the valid line
}
}
With negative lookahead:
suzu(?!.*ran).*\b
This also uses \b for a word break

Regular expression for "$/Folder1/Folder2/Folder3/File.xml"

What should my regular expression look like if I want to validate that $/Folder1/Folder2/Folder3/File.xml always starts with $ and always ends with xml
"$/Folder1/Folder2/Folder3/File.xml"
Pass
"$/Folder1/Folder2/Folder3/File.xm"
Fail
"$/Folder1/Folder2/Folder3/File.py"
Fail
"A/Folder1/Folder2/Folder3/File.xml"
Fail
Edit... So... The right regular expression is...
"^\$.*xml$"
The the method after the implementation of the regex checker looks like...
public bool ValidateConfigPath(string config)
{
var match = Regex.Match(config, #"^\$.*xml$", RegexOptions.IgnoreCase);
return match.Success;
}
And all my unit tests pass...
[TestMethod]
public void ValidateConfigPath_InCorrect1()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("$/Quantz/Main/CSS Calculator/main.py");
Assert.IsFalse(isValid);
}
[TestMethod]
public void ValidateConfigPath_InCorrect2()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("C:/Quantz/Main/CSS Calculator/main.xml");
Assert.IsFalse(isValid);
}
[TestMethod]
public void ValidateConfigPath_Correct()
{
var t = new TfsWrapper();
var isValid = t.ValidateConfigPath("$/Quantz/Main/CSS Calculator/main.xml");
Assert.IsTrue(isValid);
}
If there's not a strict requirement for using regular expressions, I recommend the more straight-forward approach of simply checking the starting and ending characters:
string.startswith("$") and string.endswith("xml")
With the above, the intent is absolutely clear to anyone, including people who don't understand regular expressions.
Have you read a tutorial?
^\$.*xml$
^ is the beginning of the string. \$ is a literal $ character. .* is 0 or more arbitrary characters (in fact, no line breaks, but that does not seem to matter in your input example). xml is really just xml. And $ is the end of the string.
Try this:-
^\$.*xml$
Check this link for details

How to extract the useful data with regular expression in C#?

Sorry guys, it seems like I didn't explain my question clearly. Please allow me to rephrase my question again.
I use WebClient to download the whole webpage and I got the content as a string
"
.......
.....
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
......
";
in this content, I want to get only one line which is
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
now I want use regular expression to get this string and get the value of picArr.
my reg exp is
var picArr ="([.]*)"
I think the dot means any characters. But it doesn't work. :(
Any idea?
THanks a lot
/picArr =\"([^\"]+)\"/
If I got this right that's what you need.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ExtractFileNames
{
class Program
{
static void Main(string[] args)
{
string pageData = #"blah blah
var picArr =""/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png""
more blah decimal blah";
var match = Regex.Match(pageData, #"var\s+picArr\s*=\s*""(.*?)""");
var str = match.Groups[1].Value;
var files = str.Split('|');
foreach(var f in files)
{
Console.WriteLine(f);
}
Console.ReadLine();
}
}
}
Output:
/d/manhua/naruto/516/1.png
/d/manhua/naruto/516/2.png
/d/manhua/naruto/516/3.png
/d/manhua/naruto/516/4.png
/d/manhua/naruto/516/5.png
/d/manhua/naruto/516/6.png
/d/manhua/naruto/516/7.png
/d/manhua/naruto/516/8.png
/d/manhua/naruto/516/9.png
/d/manhua/naruto/516/10.png
/d/manhua/naruto/516/11.png
/d/manhua/naruto/516/12.png
/d/manhua/naruto/516/13.png
/d/manhua/naruto/516/14.png
/d/manhua/naruto/516/15.png
/d/manhua/naruto/516/16.png
If you just want to get the filenames, you could just do a split on the pipe:
var picArr = "/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png";
var splitPics = picArr.Split('|');
foreach (var pic in splitPics)
{
Console.WriteLine(pic);
}
It looks like you want the value of the string literal in your snippet, "/d/manhua/naruto/516/1.png|..."
Get rid of the square brackets. "." matches any character just as it is, without brackets. Square brackets are for matching a limited set of characters: For example, you'd use "[abc]" to match any "a", "b", or "c".
It looks like the brackets have the effect of escaping the ".", a feature I hadn't known about (or forgot, sometime in the Ordovician). But I tested the regex as you have it with the string value replaced with a series of dots, and the regex matched. It's being treated as a literal "." character, which you would more likely try to match with a backslash escape: "\."
So just get rid of the brackets and it should work. It works in VS2008 for me.

Categories

Resources