Regex to get values from a string using C# - c#

I have posted this earlier but did not give clear information on what i was trying to achieve.
I am trying get values from a string using Regex in c#. I am not able to understand why some values i could get and some i can not using a similar approach.
Please find the code snippet below.
Kindly let me know what i am missing.
Thanks in advance.
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
//toget the value 20160409 from the above text
//this code works fine
Regex pattern = new Regex(#"([0][*]MAO[-][0][0][1].*?[*](?<Value>\d+)[*])");
Match match = pattern.Match(text);
string Value = match.Groups["Value"].Value.ToString();
//to get the value ENC000200800400120160407 from the above text
// this does not work and gives me nothing
Regex pattern2 = new Regex(#"([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\d+)[*])");
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();

It looks your file is '*' delimitered.
You can use one single regex to catch all the values
Try use
((?<values>[^\*]+)\*)
as your pattern.
All these values will be catched in values array.
----Update add c# code-----
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
Regex pattern = new Regex(#"(?<values>[^\*]+)\*");
var matches = pattern.Matches(text);
string Value = matches[3].Groups["values"].Captures[0];
string Value2 = matches[6].Groups["values"].Captures[0];

You need to use this for 2nd regex
([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\w+)[*])
\w is any character from set [A-Za-z0-9_]. You were using only \d which searches for digits [0-9] which was not the case
C# Code

In your second try at using the regex, you are matching with pattern and not pattern2.
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
You are also using the Groups from match and not match2.
This is why it is important to name your variables something meaningful to what they represent. Yes it may be a "pattern" but what does that pattern represent. When you use variables that are vaguely named it creates issues like these.

You almost got it, but the field you're looking for contains letters and digits.
This is your second regex kind of fixed.
([0][*]MAO[-][0][0][1].*?[*](?:.*?[*]){4}(?<Value2>.*?)[*])
( # (1 start)
[0] [*] MAO [-] [0] [0] [1] .*? [*]
(?: .*? [*] ){4}
(?<Value2> .*? ) # (2)
[*]
) # (1 end)
To make it a little less busy, this might be better
(0\*MAO-001.*?\*(?:[^*]*\*){4}(?<Value2>[^*]*)\*)

Related

Regular Expression to find string and set as variable

I am trying to make a regular expression that will tell me if a string has {0#} where zero can be repeated. Once I confirm that a string has this I am then trying to set it to a variable so I can count the number of 0s and replace the # with another number. I have /([{0]})([#}])/g which works on detection but not on pulling it out to another variable.
Edit:
Thanks to all, the answer was
Regex regex = new Regex(#"\{(0+)(#)\}");
Match match = regex.Match(text);
if (match.Success)
{
int zeros = Regex.Matches(match.Value, "0").Count;
}
Use this:
\{(0+)(#)\}
character {
then one or more occurance of 0
a # sign
character }
Live Demo
You are super close. The problem you are having is because your capture group - the ( ) needs to be just around the zeroes. You also don't strictly need the other capture group unless you are doing something with it. You can rewrite your regex like this:
{(0+)#}
{ - match '{'
(0+) - match and capture one or more '0'
# - match '#'
} - match '}'

Error when counting a specific word in a string with Regex.Matches

I tried to find a method to count a specific word in a string, and I found this.
using System.Text.RegularExpressions;
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ",");
int cnt = matches.Count;
Console.WriteLine(cnt);
It worked fine, and the result shows 2.
But when I change "," to ".", it shows 18, not the expected 1. Why?
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ".");
and when I change "," to "(", it shows me an error!
the error reads:
SYSTEM.ARGUMENTEXCEPTION - THERE ARE TOO MANY (...
I don't understand why this is happening
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "(");
Other cases seem to work fine but I need to count "(".
The first instance, with the ., is using a special character which has a different meaning in regular expressions. It is matching ALL of the characters you have; hence you getting a result of 18.
http://www.regular-expressions.info/dot.html
To match an actual "." character, you'll need to "escape" it so that it is read as a full-stop and not a special character.
MatchCollection matches = Regex.Matches("hi, hi, everybody.", "\.");
The same exists for the ( character. It's a special character that has a different meaning in terms of regular expressions and you will need to escape it.
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "\(");
Looks like you're new to regular expressions so I'd suggest reading, the link I posted above is a good start.
HOWEVER!
If you are looking to just count ocurences in a string, you don't need regex.
How would you count occurrences of a string within a string?
If you're using .NET 3.5 you can do this in a one-liner with LINQ:
int cnt = source.Count(f => f == '(');
If you don't want to use LINQ you can do it with:
int cnt = source.Split('(').Length - 1;
The second parameter represents a pattern, not necessarily just a character to search for in your string, and the ( by itself is an invalid pattern.
You don't need Regex to count occurrences of a character. Just use LINQ's Count():
var input = "hi( hi( everybody.";
var occurrences = input.Count(x => x == '('); // 2
( character is a special character which means start of a group. If you need to use ( as literal you need to escape it with \(. That should solve your problem.

Regex accepting all strings, wronly

I am trying to substrings if they have certain format. Substring Regex query is [CENAOD(xyx)]. I have done following code but when running this in cycle it says all results match which is wrong. Where I've done something wrong?
string strRegex = #"(\[CENAOD\((\S|\W)*\)\])*";
string strCenaOd = sReader["intro"].ToString()
if (Regex.IsMatch(strCenaOd, strRegex, RegexOptions.IgnoreCase))
{
string = (want to read content of ( ) = xyz in example)
}
Remove the outer ( ... )*.
That says no match is a good match too.
Or use + instead of *.
Adding to #Kent's and #leppie's answers, the code surrounding the regex needs work, too. I think this is what you were trying for:
string strRegex = #"\[CENAOD\(([^)]*)\)\]";
string strCenaOd = sReader["intro"].ToString();
Match m = Regex.Match(strCenaOd, strRegex, RegexOptions.IgnoreCase);
if (m.Success)
{
string content = m.Groups[1];
// ...
}
IsMatch() is a simple yes-or-no check, it doesn't provide any way to retrieve the matched text.
I especially want to comment on (\S|\W)*, from your regex. First, \S|\W is a very inefficient way to match any character. . is usually all you need, but as Kent pointed out, [^)] (i.e., any character except )) is more appropriate in this case. Also, by placing the * outside the round brackets, you'll only ever capture the last character. ([^)]*) captures all of them. For more details, read this.
if you said "all strings", how about:
\[CENAOD\([^\)]*\)\]

RegEx Problem using .NET

I have a little problem on RegEx pattern in c#. Here's the rule below:
input: 1234567
expected output: 123/1234567
Rules:
Get the first three digit in the input. //123
Add /
Append the the original input. //123/1234567
The expected output should looks like this: 123/1234567
here's my regex pattern:
regex rx = new regex(#"((\w{1,3})(\w{1,7}))");
but the output is incorrect. 123/4567
I think this is what you're looking for:
string s = #"1234567";
s = Regex.Replace(s, #"(\w{3})(\w+)", #"$1/$1$2");
Instead of trying to match part of the string, then match the whole string, just match the whole thing in two capture groups and reuse the first one.
It's not clear why you need a RegEx for this. Why not just do:
string x = "1234567";
string result = x.Substring(0, 3) + "/" + x;
Another option is:
string s = Regex.Replace("1234567", #"^\w{3}", "$&/$&"););
That would capture 123 and replace it to 123/123, leaving the tail of 4567.
^\w{3} - Matches the first 3 characters.
$& - replace with the whole match.
You could also do #"^(\w{3})", "$1/$1" if you are more comfortable with it; it is better known.
Use positive look-ahead assertions, as they don't 'consume' characters in the current input stream, while still capturing input into groups:
Regex rx = new Regex(#"(?'group1'?=\w{1,3})(?'group2'?=\w{1,7})");
group1 should be 123, group2 should be 1234567.

Regex to match multiple strings

I need to create a regex that can match multiple strings. For example, I want to find all the instances of "good" or "great". I found some examples, but what I came up with doesn't seem to work:
\b(good|great)\w*\b
Can anyone point me in the right direction?
Edit: I should note that I don't want to just match whole words. For example, I may want to match "ood" or "reat" as well (parts of the words).
Edit 2: Here is some sample text: "This is a really great story."
I might want to match "this" or "really", or I might want to match "eall" or "reat".
If you can guarantee that there are no reserved regex characters in your word list (or if you escape them), you could just use this code to make a big word list into #"(a|big|word|list)". There's nothing wrong with the | operator as you're using it, as long as those () surround it. It sounds like the \w* and the \b patterns are what are interfering with your matches.
String[] pattern_list = whatever;
String regex = String.Format("({0})", String.Join("|", pattern_list));
(good)*(great)*
after your edit:
\b(g*o*o*d*)*(g*r*e*a*t*)*\b
I think you are asking for smth you dont really mean
if you want to search for any Part of the word, you litterally searching letters
e.g. Search {Jack, Jim} in "John and Shelly are cool"
is searching all letters in the names {J,a,c,k,i,m}
*J*ohn *a*nd Shelly *a*re
and for that you don't need REG-EX :)
in my opinion,
A Suffix Tree can help you with that
http://en.wikipedia.org/wiki/Suffix_tree#Functionality
enjoy.
I don't understand the problem correctly:
If you want to match "great" or "reat" you can express this by a pattern like:
"g?reat"
This simply says that the "reat"-part must exist and the "g" is optional.
This would match "reat" and "great" but not "eat", because the first "r" in "reat" is required.
If you have the too words "great" and "good" and you want to match them both with an optional "g" you can write this like this:
(g?reat|g?ood)
And if you want to include a word-boundary like:
\b(g?reat|g?ood)
You should be aware that this would not match anything like "breat" because you have the "reat" but the "r" is not at the word boundary because of the "b".
So if you want to match whole words that contain a substring link "reat" or "ood" then you should try:
"\b\w*?(reat|ood)\w+\b"
This reads:
1. Beginning with a word boundary begin matching any number word-characters, but don't be gready.
2. Match "reat" or "ood" enshures that only those words are matched that contain one of them.
3. Match any number of word characters following "reat" or "ood" until the next word boundary is reached.
This will match:
"goodness", "good", "ood" (if a complete word)
It can be read as: Give me all complete words that contain "ood" or "reat".
Is that what you are looking for?
I'm not entirely sure that regex alone offers a solution for what you're trying to do. You could, however, use the following code to create a regex expression for a given word. Although, the resulting regex pattern has the potential to become very long and slow:
function wordPermutations( $word, $minLength = 2 )
{
$perms = array( );
for ($start = 0; $start < strlen( $word ); $start++)
{
for ($end = strlen( $word ); $end > $start; $end--)
{
$perm = substr( $word, $start, ($end - $start));
if (strlen( $perm ) >= $minLength)
{
$perms[] = $perm;
}
}
}
return $perms;
}
Test Code:
$perms = wordPermutations( 'great', 3 ); // get all permutations of "great" that are 3 or more chars in length
var_dump( $perms );
echo ( '/\b('.implode( '|', $perms ).')\b/' );
Example Output:
array
0 => string 'great' (length=5)
1 => string 'grea' (length=4)
2 => string 'gre' (length=3)
3 => string 'reat' (length=4)
4 => string 'rea' (length=3)
5 => string 'eat' (length=3)
/\b(great|grea|gre|reat|rea|eat)\b/
Just check for the boolean that Regex.IsMatch() returns.
if (Regex.IsMatch(line, "condition") && Regex.IsMatch(line, "conditition2"))
The line will have both regex, right.

Categories

Resources