Regex accepting all strings, wronly - c#

I am trying to substrings if they have certain format. Substring Regex query is [CENAOD(xyx)]. I have done following code but when running this in cycle it says all results match which is wrong. Where I've done something wrong?
string strRegex = #"(\[CENAOD\((\S|\W)*\)\])*";
string strCenaOd = sReader["intro"].ToString()
if (Regex.IsMatch(strCenaOd, strRegex, RegexOptions.IgnoreCase))
{
string = (want to read content of ( ) = xyz in example)
}

Remove the outer ( ... )*.
That says no match is a good match too.
Or use + instead of *.

Adding to #Kent's and #leppie's answers, the code surrounding the regex needs work, too. I think this is what you were trying for:
string strRegex = #"\[CENAOD\(([^)]*)\)\]";
string strCenaOd = sReader["intro"].ToString();
Match m = Regex.Match(strCenaOd, strRegex, RegexOptions.IgnoreCase);
if (m.Success)
{
string content = m.Groups[1];
// ...
}
IsMatch() is a simple yes-or-no check, it doesn't provide any way to retrieve the matched text.
I especially want to comment on (\S|\W)*, from your regex. First, \S|\W is a very inefficient way to match any character. . is usually all you need, but as Kent pointed out, [^)] (i.e., any character except )) is more appropriate in this case. Also, by placing the * outside the round brackets, you'll only ever capture the last character. ([^)]*) captures all of them. For more details, read this.

if you said "all strings", how about:
\[CENAOD\([^\)]*\)\]

Related

Regex to get values from a string using C#

I have posted this earlier but did not give clear information on what i was trying to achieve.
I am trying get values from a string using Regex in c#. I am not able to understand why some values i could get and some i can not using a similar approach.
Please find the code snippet below.
Kindly let me know what i am missing.
Thanks in advance.
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
//toget the value 20160409 from the above text
//this code works fine
Regex pattern = new Regex(#"([0][*]MAO[-][0][0][1].*?[*](?<Value>\d+)[*])");
Match match = pattern.Match(text);
string Value = match.Groups["Value"].Value.ToString();
//to get the value ENC000200800400120160407 from the above text
// this does not work and gives me nothing
Regex pattern2 = new Regex(#"([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\d+)[*])");
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
It looks your file is '*' delimitered.
You can use one single regex to catch all the values
Try use
((?<values>[^\*]+)\*)
as your pattern.
All these values will be catched in values array.
----Update add c# code-----
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
Regex pattern = new Regex(#"(?<values>[^\*]+)\*");
var matches = pattern.Matches(text);
string Value = matches[3].Groups["values"].Captures[0];
string Value2 = matches[6].Groups["values"].Captures[0];
You need to use this for 2nd regex
([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\w+)[*])
\w is any character from set [A-Za-z0-9_]. You were using only \d which searches for digits [0-9] which was not the case
C# Code
In your second try at using the regex, you are matching with pattern and not pattern2.
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
You are also using the Groups from match and not match2.
This is why it is important to name your variables something meaningful to what they represent. Yes it may be a "pattern" but what does that pattern represent. When you use variables that are vaguely named it creates issues like these.
You almost got it, but the field you're looking for contains letters and digits.
This is your second regex kind of fixed.
([0][*]MAO[-][0][0][1].*?[*](?:.*?[*]){4}(?<Value2>.*?)[*])
( # (1 start)
[0] [*] MAO [-] [0] [0] [1] .*? [*]
(?: .*? [*] ){4}
(?<Value2> .*? ) # (2)
[*]
) # (1 end)
To make it a little less busy, this might be better
(0\*MAO-001.*?\*(?:[^*]*\*){4}(?<Value2>[^*]*)\*)

Pattern Matching c#

Lets say I have a text file with the line below within it. I want to take both values within the quotations by matching between (" and "), so that would be I retreive ABC and DEF and put them in a string list or something, what's the best way of doing this? It's so annoying
If EXAMPLEA("ABC") AND EXAMPLEB("DEF")
Assuming a case where the value between the double quotes can not contain escaped double quotes might work like this:
var text = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
Regex pattern = new Regex("\"[^\"]*\"");
foreach (Match match in pattern.Matches(text))
{
Console.WriteLine(match.Value.Trim('"'));
}
But this is only one of the many ways you could do it and maybe not the smartest way out there. Try something yourself!
Best way...
List<string> matches=Regex.Matches(File.ReadAllText(yourPath),"(?<="")[^""]*(?="")")
.Cast<Match>()
.Select(x=>x.Value)
.ToList();
This pattern should do the trick:
\"([^"]*)\"
string str = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
MatchCollection matched = Regex.Matches(str, #"\""([^\""]*)\""");
foreach (Match match in matched)
{
Console.WriteLine(match.Groups[1].Value);
}
Note that the quotation marks are doubled in the actual code in order to escape them. And the code refers to group [1] to get just the part inside the parentheses.
IEnumerable<string> matches =
from Match match
in Regex.Matches(File.ReadAllText(filepath), #"\""([^\""]*)\""")
select match.Groups[1].Value;
Others already posted some answers, but my takes into account that you just want ABC and DEF in your example, without quotation marks and save it in a IEnumerable<string>.

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text
Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.
Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.
If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();
The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

question about regex

i'v got code
s = Regex.Match(item.Value, #"\/>.*?\*>", RegexOptions.IgnoreCase).Value;
it returns string like '/>test*>', i can replace symbols '/>' and '*>', but how can i return string without this symbols , only string 'test' between them?
You can save parts of the regex by putting ()'s around the area. so for your example:
// item.Value == "/>test*>"
Match m = Regex.Match(item.Value, #"\/>(.*?)\*>");
Console.WriteLine(m.Groups[0].Value); // prints the entire match, "/>test*>"
Console.WriteLine(m.Groups[1].Value); // prints the first saved group, "test*"
I also removed RegexOptions.IgnoreCase because we aren't dealing with any letters specifically, whats an uppercase /> look like? :)
You can group patterns inside the regx and get those from the match
var match= Regex.Match(item.Value, #"\/>(?<groupName>.*)?\*>", RegexOptions.IgnoreCase);
var data= match.Groups["groupName"].Value
You can also use look-ahead and look-behind. For your example it would be:
var value = Regex.Match(#"(?<=\/>).*?(?=\*>)").Value;

Regex.split, how to read left of the matched pattern

I am trying to convert a Perl script to a C# 3.5 routine.
The perl code I have is:
if($work =~ /\<[0-9][0-9][0-9]\>/){
$left = $`;
$match = $&;
$work = $';
}
In C# I wrote the following code:
string[] sSplit = Regex.Split(work, #"\<[0-9][0-9][0-9]\>");
if sSplit.length is > 2
{
left = sSplit[0];
match = sSplit[1];
work = sSPlit[2];
}
However the above is not giving me the matched pattern in sSplit[1], but the content to the right of the matched string instead.
Regex.Split is not what you need. The equivalent to =~ /.../ is Regex.Match.
However, Regex.Match has no equivalent to Perl’s $` or $', so you need to use a workaround, but I think it’s a fair one:
var m = Regex.Match(work, #"^(.*?)(\<[0-9][0-9][0-9]\>)(.*)$", RegexOptions.Singleline);
if (m.Success)
{
left = m.Groups[0].Value;
match = m.Groups[1].Value; // perhaps with Convert.ToInt32()?
work = m.Groups[2].Value;
}
Alternatively, you can use the match index and length to get the stuff:
var m = Regex.Match(work, #"^\<[0-9][0-9][0-9]\>");
if (m.Success)
{
left = work.Substring(0, m.Index);
match = m.Value; // perhaps with Convert.ToInt32()?
work = work.Substring(m.Index + m.Length);
}
When trying regular expressions, I always recomment RegexHero, which is an online tool that visualizes your .NET regular expressions. In this case, use Regex.Match and use Groups. That'll give what you want.
Note that the backslash in \< and \> are not needed in C# (nor in Perl, btw).
Also note that $`, $& and $' have equivalents in C# when used in a replacement expression. If that's what you need in the end, you can use these "magic variables", but only in Regex.Replace.
A split is usually asking to throw away the delimiters. Perl acts just the same way (without the verboten $& type variables.)
You capture delimters in Perl by putting parens around them:
my #parts = split /(<[0-9][0-9][0-9]>)/; # includes the delimiter
my #parts = split /<[0-9][0-9][0-9]>/; # doesn't include the delimiter

Categories

Resources