Display all possible matches for a regex pattern - c#

I have the following RegEx pattern in order to determine some 3-digit exchanges of phone numbers:
(?:2(?:04|[23]6|[48]9|50)|3(?:06|43|65)|4(?:03|1[68]|3[178]|50)|5(?:06|1[49]|79|8[17])|6(?:0[04]|13|39|47)|7(?:0[59]|78|8[02])|8(?:[06]7|19|73)|90[25])
It looks pretty daunting, but it only yields around 40 or 50 numbers. Is there a way in C# to generate all numbers that match this pattern? Offhand, I know I can loop through the numbers 001 thru 999, and check each number against the pattern, but is there a cleaner, built-in way to just generate a list or array of matches?
ie - {"204","226","236",...}

No, there is no off the shelf tool to determine all matches given a regex pattern. Brute force is the only way to test the pattern.
Update
It is unclear why you are using (?: ) which is the "Match but don't capture". It is used to anchor a match, for example take this phone text phone:303-867-5309 where we don't care about the phone: but we want the number.
The pattern used would be
(?:phone\:)(\d{3}-\d{3}-\d{4})
which would match the whole line, but the capture returned would just be the second match of the phone number 303-867-5309.
So the (?: ) as mentioned is used to anchor a match capture at a specific point; with text match text thrown away.
With that said, I have redone your pattern with comments and a test to 2000:
string pattern = #"
^ # Start at beginning of line so no mid number matches erroneously found
(
2(04|[23]6|49|[58]0) # 2 series only match 204, 226, 236, 249, 250, 280
| # Or it is not 2, then match:
3(06|43|65) # 3 series only match 306, 343, 365
)
$ # Further Anchor it to the end of the string to keep it to 3 numbers";
// RegexOptions.IgnorePatternWhitespace allows us to put the pattern over multiple lines and comment it. Does not
// affect regex parsing/processing.
var results = Enumerable.Range(0, 2000) // Test to 2000 so we don't get any non 3 digit matches.
.Select(num => num.ToString().PadLeft(3, '0'))
.Where (num => Regex.IsMatch(num, pattern, RegexOptions.IgnorePatternWhitespace))
.ToArray();
Console.WriteLine ("These results found {0}", string.Join(", ", results));
// These results found 204, 226, 236, 249, 250, 280, 306, 343, 365

I took the advice of #LucasTrzesniewski and just looped through the possible values. Since I know I’m dealing w/ 3-digit numbers, I just looped through the numbers/strings “000” thru “999” and checked for matches like this:
private static void FindRegExMatches(string pattern)
{
for (var i = 0; i < 1000; i++)
{
var numberString = i.ToString().PadLeft(3, '0');
if (!Regex.IsMatch(numberString, pattern)) continue;
Console.WriteLine("Found a match: {0}, numberString);
}
}

Related

Get only the longest match from the groups in regex

I have various strings like '10001110110', '10000', '100001', '00011','0001', '111000' etc..
I need to find out the longest possible combination of 1s with no or 1 zero in between.
I have got a regex like this - (?=(1+01+))
But its not returning a group where there is no leading or trailing one.
I want to regex to consider this case too.
Currently its returning all groups
Eg. if the input string is '10110111' it returns 3 groups
{null, 1011}, {null, 110111} and {null, 10111}
I want my regex to return only 1 match with the longest combination. Is it possible to do so?
For the following rule:
I need to find out the longest possible combination of 1s with no or 1
zero in between.
you can capture 1+ times a 1, and then optionally match 0 followed by again 1+ times a 1 in the lookahead assertion.
(?=(1+(?:0?1+)?))
Regex demo | C# demo
To get the longest result, you can process the matches, and then sort by the length of the string, and then get the first result from the collection.
string pattern = #"(?=(1+(?:0?1+)?))";
string input = #"10001110110 10000 100001 00011 0001 111000 101110111011011";
var result = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.OrderByDescending(s => s.Length)
.FirstOrDefault();
Console.WriteLine(result);
Output
1110111

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Trying to match multiple words multiple times, any order using regex

I'm trying to check if a text contains two or more specific words. The words can be in any order an can show up in the text multiple times but at least once.
If the text is a match I will need to get the information about location of the words.
Lets say we have the text :
"Once I went to a store and bought a coke for a dollar and I got another coke for free"
In this example I want to match the words coke and dollar.
So the result should be:
coke : index 37, lenght 4
dollar : index 48, length 6
coke : index 84, length 4
What I have already is this: (which I think is little bit wrong because it should contain each word at least once so the + should be there instead of the *)
(?:(\bcoke\b))\*(?:(\bdollar\b))\*
But with that regex the RegEx Buddy highlights all the three words if I ask it to hightlight group 1 and group 2.
But when I run this in C# I won't get any results.
Can you point me to the right direction ?
I don't think it's possible what you want only using regular expressions.
Here is a possible solution using regular expressions and linq:
var words = new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "coke", "dollar" };
var regex = new Regex(#"\b(?:"+string.Join("|", words)+#")\b", RegexOptions.IgnoreCase);
var text = #"Once I went to a store and bought a coke
for a dollar and I got another coke for free";
var grouped = regex.Matches(text)
.OfType<Match>()
.GroupBy(m => m.Value, StringComparer.OrdinalIgnoreCase)
.ToArray();
if (grouped.Length != words.Count)
{
//not all words were found
}
else
{
foreach (var g in grouped)
{
Console.WriteLine("Found: " + g.Key);
foreach (var match in g)
Console.WriteLine(" At {0} length {1}", match.Index, match.Length);
}
}
Output:
Found: coke
At 36 length 4
At 72 length 4
Found: dollar
At 47 length 6
How about this, it is pret-tay bad but I think it has a shot at working and it is pure RegEx no extra tools.
(?:^|\W)[cC][oO][kK][eE](?:$|\W)|(?:^|\W)[dD][oO][lL][lL][aA][rR](?:$|\W)
Get rid of the \w's if you want it to capture cokeDollar or dollarCoKe etc.

Regex help - match any number of characters

I have following kind of string-sets in a text file:
<< /ImageType 1
/Width 986 /Height 1
/BitsPerComponent 8
/Decode [0 1 0 1 0 1]
/ImageMatrix [986 0 0 -1 0 1]
/DataSource <
803fe0503824160d0784426150b864361d0f8844625138a4562d178c466351b8e4763d1f904864523924964d27944a6552b964b65d2f984c665339a4d66d379c4e6753b9e4f67d3fa05068543a25168d47a4526954ba648202
> /LZWDecode filter >> image } def
There are 100s of Images defined like above.
I need to find all such images defined in the document.
Here is my code -
string txtFile = #"text file path";
string fileContents = File.ReadAllText(txtFile);
string pattern = #"<< /ImageType 1.*(\n|\r|\r\n)*image } def"; //match any number of characters between `<< /ImageType 1` and `image } def`
MatchCollection matchCollection = Regex.Matches(fileContents, pattern, RegexOptions.Singleline);
int count = matchCollection.Count; // returns 1
However, I am getting just one match - whereas there are around 600 images defined.
But it seems they all are matched in one because of 'newline' character used in pattern.
Can anyone please guide what do I need to modify the correct result of regex match as 600.
The reason is that regular expressions are usually greedy, i.e. the matches are always as long as possible. Thus, the image } def is contained in the .*. I think the best approach here would be to perform two separate regex queries, one for << /ImageType 1 and one for image } def. Every match of the first pattern would correspond to exactly one match of the second one and as these matches carry their indices in the original string, you can reconstruct the image by accessing the appropriate substring.
Instead of .* you should use the non-greedy quantifier .*?:
string pattern = #"<< /ImageType 1.*?image } def";
Here is a site that can help you out with REGEX that I use. http://webcheatsheet.com/php/regular_expressions.php.
if(preg_match('/^/[a-z]/i', $string, $matches)){
echo "Match was found <br />";
echo $matches[0];
}

Need multiple regular expression matches using C#

So I have this list of flight data and I need to be able to parse through it using regular expressions (this isn't the entire list).
1 AA2401 F7 A4 Y7 B7 M7 H7 K7 /DFW A LAX 4 0715 0836 E0.M80 9 3:21
2 AA2421 F7 A1 Y7 B7 M7 H7 K7 DFWLAX 4 1106 1215 E0.777 7 3:09
3UA:US6352 B9 M9 H9 K0 /DFW 1 LAX 1200 1448 E0.733 1:48
For example, I might need from the first line 1, AA, 2401, and so on and so on. Now, I'm not asking for someone to come up with a regular expression for me because for the most part I'm getting to where I can pretty much handle that myself. My issue has more to do with being able to store the data some where and access it.
So I'm just trying to initially just "match" the first piece of data I need, which is the line number '1'. My "pattern" for just getting the first number is: ".?(\d{1,2}).*" . The reason it's {1,2} is because obviously once you get past 10 it needs to be able to take 2 numbers. The rest of the line is set up so that it will definitely be a space or a letter.
Here's the code:
var assembly = Assembly.GetExecutingAssembly();
var textStreamReader = new StreamReader(
assembly.GetManifestResourceStream("FlightParser.flightdata.txt"));
List<string> lines = new List<string>();
do
{
lines.Add(textStreamReader.ReadLine());
} while (!textStreamReader.EndOfStream);
Regex sPattern = new Regex(#".?(\d{1,2}).*");//whatever the pattern is
foreach (string line in lines)
{
System.Console.Write("{0,24}", line);
MatchCollection mc = sPattern.Matches(line);
if ( sPattern.IsMatch(line))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
System.Console.WriteLine(mc[0].Groups[0].Captures);
System.Console.WriteLine(line);
}//end foreach
System.Console.ReadLine();
With the code I'm writing, I'm basically just trying to get '1' into the match collection and somehow access it and write it to the console (for the sake of testing, that's not the ultimate goal).
Your regex pattern includes an asterisk which matches any number of characters - ie. the whole line. Remove the "*" and it will only match the "1". You may find an online RegEx tester such as this useful.
Assuming your file is not actually formatted as you posted and has each of the fields separated by something, you can match the first two-digit number of the line with this regex (ignoring 0 and leading zeros):
^\s*([1-9]\d?)
Since it is grouped, you can access the matched part through the Groups property of the Match object.
var line = "12 foobar blah 123 etc";
var re = new Regex(#"^\s*([1-9]\d?)");
var match = re.Match(line);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value); // "12"
}
else
{
Console.WriteLine("No match");
}
The following expression matches the first digit, that you wanted to capture, in the group "First".
^\s*(?<First>\d{1})
I find this regular expression tool highly useful when dealing with regex. Give it a try.
Also set RegexOption to Multiline when you are making the match.

Categories

Resources