Extract string between braces using RegEx, ie {{content}}

Extract string between braces using RegEx, ie {{content}} - c#

I am given a string that has place holders in the format of {{some_text}}. I would like to extract this into a collection using C# and believe RegEx is the best way to do this. RegEx is a little over my head but it seems powerful enough to work in this case. Here is my example:
<a title="{{element='title'}}" href="{{url}}">
<img border="0" alt="{{element='title'}}" src="{{element='photo' property='src' maxwidth='135'}}" width="135" height="135" /></a>
<span>{{element='h1'}}</span>
<span><strong>{{element='price'}}<br /></strong></span>
I would like to end up with something like this:
collection[0] = "element='title'";
collection[1] = "url";
collection[2] = "element='photo' property='src' maxwidth='135'";
collection[3] = "element='h1'";
collection[4] = "element='price'";
Notice that there are no duplicates either, but I do not want to complicate things if it is difficult to do.
I saw this post that does something similar but within brackets:
How to extract the contents of square brackets in a string of text in c# using Regex
My problem here is that I have double braces instead of just one character. How can I do this?

Taking exactly from the question you linked:
ICollection<string> matches =
Regex.Matches(s.Replace(Environment.NewLine, ""), #"\{\{([^}]*)\}\}")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
foreach (string match in matches)
Console.WriteLine(match);
I've changed the [ and ] to {{ and }} (escaped). This should make the collection you need. Be sure to read the first answer to the other question for the regex breakdown. It's important to understand it if you use it.

RegEx is more than powerful enough for what you need.
Try this regular expression:
\{\{.*?\}\}
That will match expressions between double brackets, lazily.
Edit: that will give you the strings, including the double brackets. You can parse them manually, but if the regex engine supports lookahead and lookbehind, you can get what's inside directly with something like:
(?<=\{\{).*?(?=\}\})

You will need to get rid of the duplicates after you have the matches.
\{\{(.*?)}}
Result 1
element='title'
Result 2
url
Result 3
element='title'
Result 4
element='photo' property='src' maxwidth='135'
Result 5
element='h1'
Result 6
element='price'

Related

RegEx for matching special chars no spaces or newlines

I have a string and want to use regex to match all the chars, but no spaces.
I tried to replace all the spaces with nothing, using:
Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);
//rating
var betyg = Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);`
I expect the output of
"Iris-presenter | 5"
but, the output is
"Iris-presenter"
seen in this also seen in this demo.
The string is:
<spaces>Iris-presenter
<spaces>|
<spaces>5

Great question! I'm not quite sure, if this would be what you might be looking for. This expression however matches your input string:
^((?!\s|\n).)*
Graph
The graph shows how it might work:
Edit
Based on revo's advice, the expression can be much simplified, because
^((?!\s|\n).)* is equal to ^((?!\s).)* and both are equal to ^\S*.

I used (\s(.*?)) for it to work. This removes all spaces and new lines seen here

Regular Expression for a JSON-type String

I am having trouble splitting a String using a regular expression
"[{'name':'abc','surname':'def'},{'name':'ghi','surname':'jkl'},{'name':'asdf','surname':'asdf'}]"
Now I'd like to split this to
"{'name':'abc','surname':'def'}" and "{'name':'ghi','surname':'jkl'}"
Later on I will deserialize both Strings and work with the values. I must admit that I've worked way too little with regular expressions and would love if someone could help me. I want to split by those square brackets as well as by the middle comma. I was either splitting by ALL commas or not splitting at all.
Kind regards

This Regex will do that:
({.*?})
and here is a Regex 101 to prove it.
To use it you might do something like this:
var match = Regex.Match(input, pattern);
// match.Groups has all of the matches

Regular expression to parse [A][B] into A and B

I am trying to separate the following string into a separate lines with regular expression
[property1=text1][property2=text2]
and the desired result should be
property1=text1
property2=text2
here is my code
string[] attrs = Regex.Split(attr_str, #"\[(.+)\]");
Results is incorrect, am probably doing something wrong
UPDATE: after applying the suggested answers. Now it shows spaces and empty string

.+ is a greedy match, so it grabs as much as possible.
Use either
\[([^]]+)\]
or
\[(.+?)\]
In the first case, matching ] is not allowed, so "as much as possible" becomes shorter. The second uses a non-greedy match.

Your dot is grabbing the braces as well. You need to exclude braces:
\[([^]]+)\]
The [^]] matches any character except a close brace.

Try adding the 'lazy' specifier:
Regex.Split(attr_str, #"\[(.+?)\]");

Try:
var s = "[property1=text1][property2=text2]";
var matches = Regex.Matches(s, #"\[(.+?)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value);

parsing tweet text with regex

Regex-noob here. Looking for some C# regex code to "syntax highlight" twitter text. So given this tweet:
#taglius here's some tweet text that shouldn't be highlighted #tagtestpix http://aurl.jpg
I want to find the user mentions (#), hashtags (#), and urls (http://) and add appropriate html to color highlight these elements. Something like
<font color=red>#taglius</font> here's some tweet text that shouldn't be highlighted <font color=blue>#tagtestpix</font> <font color=yellow>http://aurl.jpg</font>
This isn't the exact html I will use, but I think you get the idea.

The answers above are parts of the whole answer, so I think I can add a little extra to answer your question:
Your highlight function would look something like this:
public static String HighlightTwitter(String input)
{
String result = Regex.Replace(input, #"\b\#\w+", #"<font color=""red"">$0</font>");
result = Regex.Replace(result, #"\b#\w+", #"<font color=""blue"">$0</font");
result = Regex.Replace(result, #"\bhttps?://[-\w]+(\.\w[-\w]*)+(:\d+)?(/[^.!,?;""\'<>()\[\]\{\}\s\x7F-\xFF]*([.!,?]+[^.!,?;""\'<>\(\)\[\]\{\}\s\x7F-\xFF]+)*)?\b", #"<font color=""yellow"">$0</font", RegexOptions.IgnoreCase);
return result;
}
I have include \b to make sure that # and # is the start of the word and make sure that urls stands alone. This means that #this_will_highlight but#this_will_not.
If performance might be an issue you can make the Regex'es as static members with RegexOptions.Compiled
E.g.:
private static Regex regexAt = new Regex(#"\b\#\w+", RegexOptions.Compiled);
...
String result = regexAt.Replace(input, #"<font color=""red"">$0</font>");
...

The following would match the '#' character followed by a sequence of alpha-num characters:
#\w+
The following would match the '#' character followed by a sequence of alpha-num characters:
\#\w+
There are a lot of free-form http url match expressions, this is the one I use most commonly:
https?://[-\w]+(\.\w[-\w]*)+(:\d+)?(/[^.!,?;""\'<>()\[\]\{\}\s\x7F-\xFF]*([.!,?]+[^.!,?;""\'<>\(\)\[\]\{\}\s\x7F-\xFF]+)*)?
Lastly, You're going to get false positive hits with all of these so you're going to need to look real hard at how to correctly delineate these tags... For instance you have the following tweet:
the url http://Roger#example.com/#bookmark is interesting.
Obviously this is going to be a problem as all three of the expressions will match inside the url. To avoid this you will need to figure out what characters are allowed to precede or follow the match. As an example, the following requires a whitespace or start of string to precede the #name reference and requires a ',' or space following it.
(?<=[^\s])#\w+(?=[,\s])
Regex patterns are not easy, I recommend getting a tool like Expresso.

You can parse out the # replies using (\#\w+). You can parse out the hash tags using (#\w+).

Why is my .NET regex not working correctly?

I have a text file which is in the format:
key1:val1,
key2:val2,
key3:val3
and I am trying to parse the key/value pairs out with a regex. Here is the regex code I am using with the same example:
string input = #"key1:val1,
key2:val2,
key3:val3";
var r = new Regex(#"^(?<name>\w+):(?<value>\w+),?$", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
}
When I loop through r.Matches, sometimes certain key/value pairs don't appear, and it seems to be the ones with the comma at the end of the line - but I should be taking that into account with the ,?. What am I missing here?

this might be a good situation for String.Split rather than a regex:
foreach(string pair in input.Split(new Char [] {','}))
{
string [] items = pair.Split(new Char [] {':'});
Console.WriteLine(items[0]);
Console.WriteLine(items[1]);
}

The problem is that your regular expression is not matching the newline in the first two lines.
Try changing it to
#"^(?<name>\w+):(?<value>\w+),?(\n|\r|\r\n)?$"
and it should work.
By the way, I love regular expressions, but given the problem you are trying to solve, go for the string.Split solution. It will be much easier to read...
EDIT: after reading your comment, where you say that this is a simplified version of your problem, then maybe you could simplify the expression by adding some "tolerance" for spaces / newline at the end of the match with
#"^(?<name>\w+):(?<value>\w+),?\s*$"
Also, when you play with regular expressions, test them with a tool like Expresso, it saves a lot of time.

Get rid of the RegexOptions.Multiline option.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract string between braces using RegEx, ie {{content}} - c#

You will need to get rid of the duplicates after you have the matches. \{\{(.*?)}} Result 1 element='title' Result 2 url Result 3 element='title' Result 4 element='photo' property='src' maxwidth='135' Result 5 element='h1' Result 6 element='price'

Related

RegEx for matching special chars no spaces or newlines

Regular Expression for a JSON-type String

Regular expression to parse [A][B] into A and B

parsing tweet text with regex

Why is my .NET regex not working correctly?

Categories

Resources