How to match an enclosing substring using regex in .NET - c#

I'm trying to match * in id=resultsStats>*<nobr> to extract the middle bit.
This would match e.g.
id=resultsStats>3<nobr>
id=resultsStats>anything<nobr>
so I can extract the middle "3" or "anything"
How do I do this in .NET regex or otherwise?

(?<=id=resultsStats>).+?(?=<nobr>)
Use * instead of + if content is optional rather than required.
Example of use (F#):
open System.Text.RegularExpressions
let tryFindResultsStats input =
let m = Regex.Match (input,
"(?<=id=resultsStats>).+?(?=<nobr>)",
RegexOptions.Singleline)
if m.Success then Some m.Value else None

I'm not a regex expert but something like this might work:
#"\>\*{1}\<"
This means "match a single asterisk between the lt/gt characters". You just need to make sure you escape the asterisk because it has special meaning in regular expressions.
Hope this helps!

If you are looking to capture a * then you need to escape it with a backslash. Note that if you are doing this within a string it is safest to escape the backslash as well (Although technically \* isn't valid and will work)
"\\*"

Try this:
using System;
using System.Text.RegularExpressions;
namespace SO6312611
{
class Program
{
static void Main()
{
string input = "id=resultsStats>anything<nobr>";
Regex r = new Regex("id=resultsStats>(?<data>[^<]*)<nobr>");
Match m = r.Match(input);
Console.WriteLine("Matched: >{0}<", m.Groups["data"]);
}
}
}

Related

How to make regex only match with patterns that have exactly one letter before a =

I am trying to get the regex to match only when there is one letter from A-Z followed by a = like this A=, a=, B=, currently it is picking up any number of letters before the = like hem=, ac2=. Usually ^[a-zA-Z] works just fine but its not working for this case since I'm using named capture groups
String pattern = "FL2 (77) Flashing,77,a=1.875,A=90.0,b=3.625,B=95.0,c=1.375,C=175.0,d=2.5,hem=0.5,16GA-AL,";
var regex = new Regex("(?<label>[a-zA-Z]+)=(?<value>[^,]+)");
Other ways I've tried
var regex = new Regex("(?<label>^[a-zA-Z]+)=(?<value>[^,]+)");
var regex = new Regex("(?<label>[^a-zA-Z]+)=(?<value>[^,]+)");
If you want to match l= but not word=, you need a negative look-behind assertion.
new Regex("(?<![a-zA-Z])(?<label>[a-zA-Z])=(?<value>[^,]+)")
If the string pattern you have in your question is really the "haystack" in which you're looking for "needles", a really easy way to solve the problem would be to first split the string on ,, then use RegEx. Then you can use a simpler pattern ^(?<label>[a-zA-Z])=(?<value>.+)$ on each item in the list you get from splitting the string, and only keep the matches.
It's because you have a + after [a-zA-Z], which makes it match one or more characters in that character class. If you remove the +, it will only match one character before the =.
If you want it to only match in situations where there is exactly one alphabetical character before the equals sign, you will want to add to the beginning of the regex to make sure that the character before the letter you want to match is not a letter, like this:
(?<![a-zA-Z])(?<label>[a-zA-Z])=(?<value>[^,]+)
(notice though that this only matters in the case where you don't put a ^ before [a-zA-Z], in the case where you want matches that don't start at the beginning of a line)
Have you tried
var regex = new Regex("(?<label>^[a-zA-Z]?)=(?<value>[^,]+)");
I believe the "+" means 1 or more
"?" means 0 or 1
or exactly 1 should be {1} (at least in python, not sure about C#)
var regex = new Regex("(?<label>^[a-zA-Z]{1})=(?<value>[^,]+)");
Assuming that the label is separated by a comma (which seems to be the case based on your example and code) then you can use:
^|,(?<label>[A-Za-z])=(?<value>[^,]+)
I recommend Regex.Matches over capture groups here:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace Rextester
{
public class Program
{
public static void Main(string[] args)
{
string content = "FL2 (77) Flashing,77,a=1.875,A=90.0,b=3.625,B=95.0,c=1.375,C=175.0,d=2.5,hem=0.5,16GA-AL,";
const string regexPattern = "(?<=[,| ])[a-zA-Z]=([0-9|.|-])+";
string singleMatch = new Regex(regexPattern).Match(content).ToString();
Console.WriteLine(singleMatch); // a=1.875
MatchCollection matchList = Regex.Matches(content, regexPattern);
var matches = matchList.Cast<Match>().Select(match => match.Value).ToList();
Console.WriteLine(string.Join(", ", matches)); // a=1.875, A=90.0, b=3.625, B=95.0, c=1.375, C=175.0, d=2.5
}
}
}

Validate filename in c# through regex

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?
You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}
Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

Regularexpression C# how to

static void Main(string[] args)
{
int count = 0;
String s = "writeln('Helloa a') tung ('main')";
String patern = #"\'+[\S+\s]*\'";
Regex myRegex = new Regex(patern);
foreach (Match regex in myRegex.Matches(s)) {
Console.WriteLine(regex.Value.ToString());
}
}
When run, it show
'Helloa a') tung ('main'
I want not to this
I want to it print
'Helloa a'
'main'
Can you help me?
try using this regexp:
#"\'[^']+\'"
It will print:
'Helloa a'
'main'
add a ? after the * to make the * non-greedy
#"\'+[\S+\s]*?\'"
http://rubular.com/r/tso5Uvc88v
REGEXPLANATION:
A greedy regex operator will take the largest possible string that it can(between 2 single quotes which in your case is the bolded part.
writeln('Helloa a') tung ('main')
a non-greedy operator will take the smallest possible section, which is what you wanted.
to make a + or * non-greedy, just put a ? after it.
You can use a lazy quantifier. Replace * by *? as Sam I am suggests it, or use this solution:
#"\'(?>[^']+|(?<=\\)')*\'"
that allows escaped quotes.
Details
(?> open an atomic group
[^']+ all that is not a quote one or more times
| OR
(?<=\\)' a quote preceded by a backslash
) close the atomic group
* repeat the group zero or more times
More informations about atomic groups here.
I take it you only want to capture anything within the quotes and parentheses? Try this: \(\'(.+?)\'\)

Regex to extract Variable Part

I have a string containing this: #[User::RootPath]+"Dim_MyPackage10.dtsx" and I need to extract the [User::RootPath] part using a regex. So far I have this regex: [a-zA-Z0-9]*\.dtsx but I don't know how to proceed further.
For the variable, why not consume what is needed by using the not set [^ ] to extract everything except in the set?
The ^ in the braces means find what is not matched, such as this where it seeks all that is not a ] or a quote (").
Then we can place the actual matches in named capture groups (?<{NameHere}> ) and extract accordingly
string pattern = #"(?:#\[)(?<Path>[^\]]+)(?:\]\+\"")(?<File>[^\""]+)(?:"")";
// Pattern is (?:#\[)(?<Path>[^\]]+)(?:\]\+\")(?<File>[^\"]+)(?:")
// w/o the "'s escapes for the C# parser
string text = #"#[User::RootPath]+""Dim_MyPackage10.dtsx""";
var result = Regex.Match(text, pattern);
Console.WriteLine ("Path: {0}{1}File: {2}",
result.Groups["Path"].Value,
Environment.NewLine,
result.Groups["File"].Value
);
/* Outputs
Path: User::RootPath
File: Dim_MyPackage10.dtsx
*/
(?: ) is match but don't capture, because we use those as defacto anchors for our pattern and to not place them into the match capture groups.
Use this regex pattern:
\[[^[\]]*\]
Check this demo.
Your regex will match any number of alphanumeric characters, followed by .dtsx. In your example, it would match MyPackage10.dtsx.
If you want to match Dim_MyPackage10.dtsx you need to add an underscore to your list of allowed characters in the regex: [a-zA-Z0-9]*.dtsx
If you want to match the [User::RootPath], you need a regex that will stop at the last / (or \, depends on which type of slashes you use in the paths): something like this: .*\/ (or .*\\)
From the answers and comments - and the fact that none has been 'accepted' so far - it appears to me that the question/problem is not completely clear. If you're looking for the pattern [User::SomeVariable] where only 'SomeVariable' is, well, variable, then you may try:
\[User::\w+]
to capture the full expression.
Furthermore, if you wish to detect that pattern, but then need only the "SomeVariable" part, you may try:
(?<=\[User::)\w+(?=])
which uses look-arounds.
Here it is bro
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring = "source string to match with pattern";
Regex re = new Regex(#"\[\S+\]");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}

How to treat Regular Expression as plain text while using .Net Regex Class?

How to escape all regex characters automatically using built-in .NET mechanism or something like that?
I am using Regex class to find match for a regular expression. See example below.
string pattern = "abc";
Regex regexp = new Regex(#"a\w", RegexOptions.None);
if (regexp.IsMatch(pattern))
{
MessageBox.Show("Found");
}
So, here Found will hit.
Now, in some cases, I still want to use Regex class but treat Regular Expression as plain string. So, for example, I will change pattern string to #"a\w" and need Regex class should find the match.
string pattern = #"a\w";
Regex regexp = new Regex(#"a\w", RegexOptions.None);
if (regexp.IsMatch(pattern))
{
MessageBox.Show("Found");
}
In the above case also, "Found" should hit.
So, the question is how to convert or treat Regular Expression into/as a plain string or something like that which can be used in a Regex Class? How to achieve the above code snippet scenario?
Note: - I do not want to use string.Contains, string.IndexOf, etc for plain text string matching.
You can "de-regex" your string through Regex.Escape. This will transform your pattern-to-search-for into the correctly escaped regex version.
string pattern = #"a\w";
Regex regexp = new Regex(Regex.Escape(#"a\w"), RegexOptions.None);
if (regexp.IsMatch(pattern))
{
MessageBox.Show("Found");
}
In the second case you don't want to search for what \w means in regex, you want to search for the literal \w, so you'd want:
string pattern = #"a\w";
Regex regexp = new Regex(#"a\\w", RegexOptions.None);
if (regexp.IsMatch(pattern))
{
MessageBox.Show("Found");
}
(Note the escaping).
You just need to escape your special characters like: #"a\\w"

Categories

Resources