Regular Expression to find string and set as variable - c#

I am trying to make a regular expression that will tell me if a string has {0#} where zero can be repeated. Once I confirm that a string has this I am then trying to set it to a variable so I can count the number of 0s and replace the # with another number. I have /([{0]})([#}])/g which works on detection but not on pulling it out to another variable.
Edit:
Thanks to all, the answer was
Regex regex = new Regex(#"\{(0+)(#)\}");
Match match = regex.Match(text);
if (match.Success)
{
int zeros = Regex.Matches(match.Value, "0").Count;
}

Use this:
\{(0+)(#)\}
character {
then one or more occurance of 0
a # sign
character }
Live Demo

You are super close. The problem you are having is because your capture group - the ( ) needs to be just around the zeroes. You also don't strictly need the other capture group unless you are doing something with it. You can rewrite your regex like this:
{(0+)#}
{ - match '{'
(0+) - match and capture one or more '0'
# - match '#'
} - match '}'

Related

Regex to get values from a string using C#

I have posted this earlier but did not give clear information on what i was trying to achieve.
I am trying get values from a string using Regex in c#. I am not able to understand why some values i could get and some i can not using a similar approach.
Please find the code snippet below.
Kindly let me know what i am missing.
Thanks in advance.
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
//toget the value 20160409 from the above text
//this code works fine
Regex pattern = new Regex(#"([0][*]MAO[-][0][0][1].*?[*](?<Value>\d+)[*])");
Match match = pattern.Match(text);
string Value = match.Groups["Value"].Value.ToString();
//to get the value ENC000200800400120160407 from the above text
// this does not work and gives me nothing
Regex pattern2 = new Regex(#"([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\d+)[*])");
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
It looks your file is '*' delimitered.
You can use one single regex to catch all the values
Try use
((?<values>[^\*]+)\*)
as your pattern.
All these values will be catched in values array.
----Update add c# code-----
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
Regex pattern = new Regex(#"(?<values>[^\*]+)\*");
var matches = pattern.Matches(text);
string Value = matches[3].Groups["values"].Captures[0];
string Value2 = matches[6].Groups["values"].Captures[0];
You need to use this for 2nd regex
([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\w+)[*])
\w is any character from set [A-Za-z0-9_]. You were using only \d which searches for digits [0-9] which was not the case
C# Code
In your second try at using the regex, you are matching with pattern and not pattern2.
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
You are also using the Groups from match and not match2.
This is why it is important to name your variables something meaningful to what they represent. Yes it may be a "pattern" but what does that pattern represent. When you use variables that are vaguely named it creates issues like these.
You almost got it, but the field you're looking for contains letters and digits.
This is your second regex kind of fixed.
([0][*]MAO[-][0][0][1].*?[*](?:.*?[*]){4}(?<Value2>.*?)[*])
( # (1 start)
[0] [*] MAO [-] [0] [0] [1] .*? [*]
(?: .*? [*] ){4}
(?<Value2> .*? ) # (2)
[*]
) # (1 end)
To make it a little less busy, this might be better
(0\*MAO-001.*?\*(?:[^*]*\*){4}(?<Value2>[^*]*)\*)

Error when counting a specific word in a string with Regex.Matches

I tried to find a method to count a specific word in a string, and I found this.
using System.Text.RegularExpressions;
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ",");
int cnt = matches.Count;
Console.WriteLine(cnt);
It worked fine, and the result shows 2.
But when I change "," to ".", it shows 18, not the expected 1. Why?
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ".");
and when I change "," to "(", it shows me an error!
the error reads:
SYSTEM.ARGUMENTEXCEPTION - THERE ARE TOO MANY (...
I don't understand why this is happening
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "(");
Other cases seem to work fine but I need to count "(".
The first instance, with the ., is using a special character which has a different meaning in regular expressions. It is matching ALL of the characters you have; hence you getting a result of 18.
http://www.regular-expressions.info/dot.html
To match an actual "." character, you'll need to "escape" it so that it is read as a full-stop and not a special character.
MatchCollection matches = Regex.Matches("hi, hi, everybody.", "\.");
The same exists for the ( character. It's a special character that has a different meaning in terms of regular expressions and you will need to escape it.
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "\(");
Looks like you're new to regular expressions so I'd suggest reading, the link I posted above is a good start.
HOWEVER!
If you are looking to just count ocurences in a string, you don't need regex.
How would you count occurrences of a string within a string?
If you're using .NET 3.5 you can do this in a one-liner with LINQ:
int cnt = source.Count(f => f == '(');
If you don't want to use LINQ you can do it with:
int cnt = source.Split('(').Length - 1;
The second parameter represents a pattern, not necessarily just a character to search for in your string, and the ( by itself is an invalid pattern.
You don't need Regex to count occurrences of a character. Just use LINQ's Count():
var input = "hi( hi( everybody.";
var occurrences = input.Count(x => x == '('); // 2
( character is a special character which means start of a group. If you need to use ( as literal you need to escape it with \(. That should solve your problem.

Get only integer value from a string which contains bracket { in C#

I have a simple, very simple regex pattern like:
private static string FORMAT_REGEX = #"\{(\d)\}";
I have a string like I have {323} dollars and I want to get only 323
When I used:
Regex regex = new Regex(FORMAT_REGEX);
Match match = regex.Match(format);
if (match.Success)
{
return match.Groups[0].Value; // here comes {323} instead of 323
}
I'm sure that my pattern is wrong. What is the correct pattern ?
Only a small mistake.
You need a + sign after \d like this: \d+ to capture all digits.
And you need to get the first group: match.Groups[1].Value
Edit:
Here is a .NETFiddle
Groups[0] will always return the whole capture. You need to get the value of Groups[1].
Also, you need to capture multiple digits:
#"\{(\d+)\}";
// not
#"\{(\d)\}";
See the example at MSDN: Match.Groups Property for an example of just this, where you can capture multiple groups as well as the whole string. In that example they use \d{n} to capture exactly n digits.

Regex to text before set of numbers

I have text like this
Inc12345_Month
Ted12345_Month
J8T12345_Month
What I need to do is extract the 12345 and also remove everything before it. This will be done in C#
.+?(?=\d_Monthly) was working in a regex tester online but when I put it in my code it only returned 5_Month.
Edit: the 12345 could be a variable length so I cannot [0-9] multiple times.
Edit2: Code this was just to try and remove everything before the 12345
string text = /* the above text pulled in from a file */;
Regex reg = new Regex(#".+?(?=\d+_Monthly)");
text = reg.Replace(string, "");
You can use this function to strip it:
private static Regex getNumberAndBeyondRegex = new Regex(^.{2}\D+(\d.*)$", RegexOptions.Compiled);
public static string GetNumberAndBeyond(string input)
{
var match = getNumberAndBeyondRegex.Match(input);
if (!match.Success) throw new ArgumentException("String isn't in the correct format.", "input");
return match.Groups[1].Value;
}
The regex at work is ^.{2}\D+(\d.*)$
It works by grabbing anything that's a number, after at least one character that isn't a number. It'll not only match _Month but also other endings.
The regex exists out of a few parts:
^ matches the beginning of the string
.{2} matches any two characters, to prevent a digit from matching if it's the first or 2nd character, you can increase this number to be equal to the minimum prefix length - 1
\D+ matches at least one character that isn't a number
( starts capturing a group
\d.* matches at least one number and any values beyond that
) closes the capturing group
$ matches the end of the string
There are a lot of different regex flavors, many of them have slight differences in terms of escaping, capturing, replacing and quite surely some others.
For testing .NET regexes online I use the free version of the tool RegexHero, it has an popup every now and then, but it makes up for that time by showing you live results, capture groups, and instant replacing. Next to having quite a lot of features.
If you want to match anywhere within the string, you can use the regex \d+_Month, it is very similiar to your original regex. In code:
new Regex("\d+_Month").Match(input).Value
Edit:
Based on the format you supplied in the comment I've created a regex and function to parse the entire file name:
private static Regex parseFileNameRegex = new Regex(#"^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$", RegexOptions.Compiled);
public static bool TryParseFileName(string fileName, out int id, out string month, out string fileExtension)
{
id = 0; month = null; fileExtension = null;
if (fileName == null) return false;
var match = parseFileNameRegex.Match(fileName);
if (!match.Success) return false;
if (!int.TryParse(match.Groups[1].Value, out id) || id < 1) return false; // Convert the ID into a number
month = match.Groups[2].Value;
fileExtension = match.Groups[3].Value;
return true;
}
In the parse function it requires the ID to be at least 1, 0 isn't accepted (and negative numbers won't match the regex), if you don't want this restriction, simply remove || id < 1 from the function.
Using the function would look like:
int id; string month, fileExtension;
if (!TryParseFileName("CompanyName_ClientName12345_Month_Nov.pdf", out id, out month, out fileExtension))
throw new FormatException("File name is incorrectly formatted."); // Do whatever you want when you get an invalid filename
// Use id, month and fileExtension here :)
The regex ^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$ works like:
^ matches the beginning of the string
.*\D matches at least one non-numeric character
(\d+) captures at least 1 number, this is the ID
_Month_ is the literal text in between
([a-zA-Z]+) matches and captures at least 1 letter, this is the month
\. matches a . character
(\w+) matches and captures any alphanumeric (letters and numbers), this is the file extension
$ matches the end of the string
Using :
Regex reg = new Regex(#"\D+(?=(\d+)_Monthly)");
is more explicit, the result is in Groups[1].
Part by part:
.+?
Match anything, maybe. This doesn't make any sense to me. It would be equivalent to ".*", which may or may not be what you meant.
(?=
start a group
\d
Match exactly 1 decimal, which explains what you are seeing, and the rest of the number is matched by .+? which is outside the group
_Monthly
match the literal text
)
end group
I think what you want is:
.*(?=\d+_Monthly)
I guess you are missing the + sign after \d
.+?(?=\d+_Monthly)
This should ask for one or more digits.
If you don't need anything before the number, this should work:
(\d+_Month)
I use Derek Slager's regex tester when I'm working with C# regex.
Better dotnet regular expression tester

Regex to match multiple strings

I need to create a regex that can match multiple strings. For example, I want to find all the instances of "good" or "great". I found some examples, but what I came up with doesn't seem to work:
\b(good|great)\w*\b
Can anyone point me in the right direction?
Edit: I should note that I don't want to just match whole words. For example, I may want to match "ood" or "reat" as well (parts of the words).
Edit 2: Here is some sample text: "This is a really great story."
I might want to match "this" or "really", or I might want to match "eall" or "reat".
If you can guarantee that there are no reserved regex characters in your word list (or if you escape them), you could just use this code to make a big word list into #"(a|big|word|list)". There's nothing wrong with the | operator as you're using it, as long as those () surround it. It sounds like the \w* and the \b patterns are what are interfering with your matches.
String[] pattern_list = whatever;
String regex = String.Format("({0})", String.Join("|", pattern_list));
(good)*(great)*
after your edit:
\b(g*o*o*d*)*(g*r*e*a*t*)*\b
I think you are asking for smth you dont really mean
if you want to search for any Part of the word, you litterally searching letters
e.g. Search {Jack, Jim} in "John and Shelly are cool"
is searching all letters in the names {J,a,c,k,i,m}
*J*ohn *a*nd Shelly *a*re
and for that you don't need REG-EX :)
in my opinion,
A Suffix Tree can help you with that
http://en.wikipedia.org/wiki/Suffix_tree#Functionality
enjoy.
I don't understand the problem correctly:
If you want to match "great" or "reat" you can express this by a pattern like:
"g?reat"
This simply says that the "reat"-part must exist and the "g" is optional.
This would match "reat" and "great" but not "eat", because the first "r" in "reat" is required.
If you have the too words "great" and "good" and you want to match them both with an optional "g" you can write this like this:
(g?reat|g?ood)
And if you want to include a word-boundary like:
\b(g?reat|g?ood)
You should be aware that this would not match anything like "breat" because you have the "reat" but the "r" is not at the word boundary because of the "b".
So if you want to match whole words that contain a substring link "reat" or "ood" then you should try:
"\b\w*?(reat|ood)\w+\b"
This reads:
1. Beginning with a word boundary begin matching any number word-characters, but don't be gready.
2. Match "reat" or "ood" enshures that only those words are matched that contain one of them.
3. Match any number of word characters following "reat" or "ood" until the next word boundary is reached.
This will match:
"goodness", "good", "ood" (if a complete word)
It can be read as: Give me all complete words that contain "ood" or "reat".
Is that what you are looking for?
I'm not entirely sure that regex alone offers a solution for what you're trying to do. You could, however, use the following code to create a regex expression for a given word. Although, the resulting regex pattern has the potential to become very long and slow:
function wordPermutations( $word, $minLength = 2 )
{
$perms = array( );
for ($start = 0; $start < strlen( $word ); $start++)
{
for ($end = strlen( $word ); $end > $start; $end--)
{
$perm = substr( $word, $start, ($end - $start));
if (strlen( $perm ) >= $minLength)
{
$perms[] = $perm;
}
}
}
return $perms;
}
Test Code:
$perms = wordPermutations( 'great', 3 ); // get all permutations of "great" that are 3 or more chars in length
var_dump( $perms );
echo ( '/\b('.implode( '|', $perms ).')\b/' );
Example Output:
array
0 => string 'great' (length=5)
1 => string 'grea' (length=4)
2 => string 'gre' (length=3)
3 => string 'reat' (length=4)
4 => string 'rea' (length=3)
5 => string 'eat' (length=3)
/\b(great|grea|gre|reat|rea|eat)\b/
Just check for the boolean that Regex.IsMatch() returns.
if (Regex.IsMatch(line, "condition") && Regex.IsMatch(line, "conditition2"))
The line will have both regex, right.

Categories

Resources