Matching string separated by - using regex - c#

Regex is not my favorite thing, but it certainly has it's uses. Right now I'm trying to match a string consisting of this.
[video-{service}-{id}]
An example of such a string:
[video-123abC-zxv9.89]
In the example above I would like to get the "service" 123abC and the "id" zxv9.89.
So far this is what I've got. Probably overcompliacated..
var regexPattern = #"\[video-(?<id1>[^]]+)(-(?<id2>[^]]+))?\]";
var ids = Regex.Matches(text, regexPattern, RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(m => new VideoReplaceItem()
{
Tag = m.Value,
Id = string.IsNullOrWhiteSpace(m.Groups["id1"].Value) == false ? m.Groups["id1"].Value : "",
Service = string.IsNullOrWhiteSpace(m.Groups["id2"].Value) == false ? m.Groups["id2"].Value : "",
}).ToList();
This does not work and puts all the charachters after '[video-' into into Id variable.
Any suggestions?

The third part seems to be optional. The [^]]+ is actually matching the - symbol, and to fix the expression, you either need to make the first [^]]+ lazy ([^]]+?) or add a hyphen to the negated character class.
Use
\[video-(?<id1>[^]-]+)(-(?<id2>[^]-]+))?]
See the regex demo
Or with the lazy character class:
\[video-(?<id1>[^]]+?)(-(?<id2>[^]]+))?]
^
See another demo.
Since you are using named groups, you may compile the regex object with RegexOptions.ExplicitCapture option to make the regex engine treat all numbered capturing groups as non-capturing ones (so as not to add ?: after the ( that defines the optional (-(?<id2>[^]-]+))? group).

Try this:
\[video-(?<service>[^]]+?)(-(?<id>[^]]+))?\]
The "?" in the service group makes the expression before it "lazy" (meaning it matches the fewest possible characters to satisfy the overall expression).
I would recommend Regexstorm.net for .NET regex testing: http://regexstorm.net/tester

Related

Regex for searching for tags in docx files

I'm trying to use Gembox.Document to search a docx file for a tag and to retrieve the value held within the tag. The tag will always be <! and !>, for example, <!sometexthere!> will return sometexthere.
However, I can't get my regex to work properly - I've got the below.
var pattern = Regex.Escape("<!(.*?)!>");
Any help is appreciated. Thanks.
To get all the values you need use Regex.Matches instead of the Regex.Escape:
var res = Regex.Matches(s, #"<!(.*?)!>")
.Cast<Match>()
.Select(s => s.Groups[1].Value)
.ToList();
The Regex.Escape is only used to escape literal strings to be used inside regular expression patterns, e.g. . will become \. to match a literal dot symbol. Regex.Match searches for a single match, while Regex.Matches will return all non-overlapping matches. Since you need just Group 1 value, the Select clause is quite handy here and .Select(s => s.Groups[1].Value) returns just those values that are captured with Group 1 in the pattern.
See this online C# demo

Regex to extract text from a pattern in C#

I have a string pattern, which contains a ID and Text to make markup easier for our staff.
The pattern to create a "fancy" button in our CMS is:
(button_section_ID:TEXT)
Example:
(button_section_25:This is a fancy button)
How do I extract the "This is a fancy button" part of that pattern? The pattern will always be the same. I tried to do some substring stuff but that got complicated very fast.
Any help would be much appreciated!
If the text is always in the format you specified, you just need to trim parentheses and then split with ::
var res = input.Trim('(', ')').Split(':')[1];
If the string is a substring, use a regex:
var match = Regex.Match(input, #"\(button_section_\d+:([^()]+)\)");
var res = match.Success ? match.Groups[1].Value : "";
See this regex demo.
Explanation:
\(button_section_ - a literal (button_section_
\d+ - 1 or more digits
: - a colon
([^()]+) - Group 1 capturing 1+ characters other than ( and ) (you may replace with ([^)]*) to make matching safer and allow an empty string and ( inside this value)
)- a literal)`
The following .NET regex will give you a match containing a group with the text you want:
var match = Regex.Matches(input, #"\(button_section_..:(.*)\)");
The braces define a match group, which will give you everything between the button section, and the final curly brace.

Regex match method is not works correctly

I have created a Regex as below, but the Match method does not work correctly:
Regex regex = new Regex("(" + SearchText + ")", RegexOptions.IgnoreCase);
if(regex.Match(item).Success) { ... }
For example, if I set SearchText to e., and i set item to es, then Success is true.
Similarly, if have set SearchText to $ or ., then a match with 4 returns Success as true.
How come this is happening, and how can I solve this problem?
When you use a regex there are a bunch of common characters which have special meanings. For example, the period (.) character will match any character at all so if you wanted to match the words dog and dig, you could use the regex d.g.
There are MANY different special characters you can use, you should see the full .NET Regex documentation for more details.
This makes matching specific things slightly more complicated when you want to match something specific, like the end of a sentence. To match dog. you actually have to pass in dog\. as the regex to match against. You can use the Regex.Escape(string str) method to escape most simple strings, before passing them into your Regex constructor.
The other question is if you are only looking for literals, why do you use Regex at all.
string item = "bla bla e. bla";
bool result = item.Contains("e."); //returns true
Edit
Case insensitive:
result = item.IndexOf("e.", 0, StringComparison.OrdinalIgnoreCase) != -1;

How to match all regular expression groups with or without character between them

Here is my regular expression
(".+?")*([^{}\s]+)*({.+?})*
Generally matching with this expression work well, but only if there is any character between matched groups. For example this:
{1.0 0b1 2006-01-01_12:34:56.789} {1.2345 0b100000 2006-01-01_12:34:56.789}
produces two matches:
1. {1.0 0b1 2006-01-01_12:34:56.789}
2. {1.2345 0b100000 2006-01-01_12:34:56.789}
but this:
{1.0 0b1 2006-01-01_12:34:56.789}{1.2345 0b100000 2006-01-01_12:34:56.789}
only one containing last match:
{1.2345 0b100000 2006-01-01_12:34:56.789}
PS. I'm using switch g for global match
EDIT:
I do research in meantime and I must to provide additional data. I pasted whole regular expression which matches also words and strings so asterix after groups is neccecary
EDIT2: Here is example text:
COMMAND STATUS {OBJECT1}{OBJECT2} "TEXT" "TEXT"
As a result I want this groups:
COMMAND
STATUS
{OBJECT1}
{OBJECT2}
"TEXT1"
"TEXT2"
Here is my actual C# code:
var regex = new Regex("(\".+?\")*([^{}\\s]+)*({.+?})*");
var matches = regex.Matches(responseString);
return matches
.Cast<Match>()
.Where(match => match.Success && !string.IsNullOrWhiteSpace(match.Value))
.Select(match => CommandParameter.Parse(match.Value));
You can use the following regex to capture all the {...}s:
(".+?"|[^{}\s]+|{[^}]+?})
See demo here.
My approach to capture anything inside some single characters is using a negated character class with the same character. Also, since you are matching non-empty texts, you'd better off using + quantifier that ensures at least 1 character match.
EDIT:
Instead of making each group optional, you should use alternative lists.
You have an extra quantifier * for ({.+?}) sub-pattern.
You can use this regex:
("[^"]*"|{[^}]*}|[^{}\s]+)
RegEx Demo
And note how it matches both groups one with space between them and one without any space.

need to create a C# Regex similar to this perl expression

I was wondering if it is possible to build equivalent C# regular expression for finding this pattern in a filename. For example, this is the expr in perl /^filer_(\d{10}).txt(.gz)?$/i Could we find or extract the \d{10} part so I can use it in processing?
To create a Regex object that will ignore character casing and match your filter try the following:
Regex fileFilter = new Regex(#"^filter_(\d{10})\.txt(\.gz)?$", RegexOptions.IgnoreCase),
To perform the match:
Match match = fileFilter.Match(filename);
And to get the value (number here):
if(match.Success)
string id = match.Groups[1].Value;
The matched groups work similar to Perl's matches, [0] references the whole match, [1] the first sub pattern/match, etc.
Note: In your initial perl code you didn't escape the . characters so they'd match any character, not just real periods!
Yes, you can. See the Groups property of the Match class that is returned by a call to Regex.Match.
In your case, it would be something along the lines of the following:
Regex yourRegex = new Regex("^filer_(\d{10}).txt(.gz)?$");
Match match = yourRegex.Match(input);
if(match.Success)
result = match.Groups[1].Value;
I don't know, what the /i means at the end of your regex, so I removed it in my sample code.
As daniel shows, you can access the content of the matched input via groups. But instead of using default indexed groups you can also use named groups. In the following i show how and also that you can use the static version of Match.
Match m = Regex.Match(input, #"^(?i)filer_(?<fileID>\d{10}).txt(?:.gz)?$");
if(m.Success)
string s = m.Groups["fileID"].Value;
The /i in perl means IgnoreCase as also shown by Mario. This can also be set inline in the regex statement using (?i) as shown above.
The last part (?:.gz) creates a non-capturing group, which means that it’s used in the match but no group is created.
I'm not sure if that's what you want, this is how you can do that.

Categories

Resources