Get full word from MatchCollection C# not group

Get full word from MatchCollection C# not group - c#

I have words suppose A 1, A 12,A 123, A 1234 and same for B 1, B 12, B 123, B 1234 where 123 means three digits
Now by the time I do this:
MatchCollection ForA1 = Regex.Matches(inFile, #"\b(A [0-9])\b");
MatchCollection ForA2 = Regex.Matches(inFile, #"\b(A [0-9][0-9])\b");
.... and so on for three and four digits and B; total 8 lines
to reduce the code I done this:
MatchCollection ForAB1 = Regex.Matches(inFile, #"\b(A [0-9]|B [0-9])\b");
MatchCollection ForAB2 = Regex.Matches(inFile, #"\b(A [0-9][0-9]|B [0-9][0-9])\b");
.... and so on for three and four digits; total 4 lines
Now I want to do this:
MatchCollection ForAB1234 = Regex.Matches(inFile, #"\b(A [0-9]|B [0-9]...
|A [0-9][0-9]|B [0-9][0-9] and so on for three and four digits )\b"); total 1 line
At this time after matches I do this:
foreach (Match m in ForAB1)
{
//many calculations on the basis of index and length etc
}
What I want:
foreach (Match m in ForAB1)
{
if(Match is 'A [0-9]')
{//many calculations on the basis of index and length etc}
else...
}
Is there anything else simple enough so that I do not need to repeat code simply because of different number of digits? I am looking for all distinct matches that I piped.
Edit: Real problem is I do not want to m.len and then check if it is A or B because in reality I have more than thirty such expressions

To make sure you only check for A 1 type and not A 11 type, you need to use something like
foreach (Match m in ForAB1)
{
if (Regex.IsMatch(m.Value, #"^A [0-9]$"))
{//many calculations on the basis of index and length etc}
else if (Regex.IsMatch(m.Value, #"^A [0-9]{2}$"))
{//many calculations on the basis of index and length etc}
else...
}

Related

How to match different scenarios with regex in c# and groups

I want to match these different scenarios with a regex pattern. Mainly delimiter is #:
1234-1111-234.011#333 => [id = 1234-1111-234.011 and code =333]
whatever text before 1234-1111-234.011#333 => [textb=whatever text before, id = 1234-1111-234.011 , code =333, texta="]
1234-1111-234.011#333 whatever text after => [ textb="" id = 1234-1111-234.011 ,code =333 , texta=whatever text after]
Text can be both at the beginning or the end
In every case code can contain also a postfix letter W like 1234-1111-234.011#333W => code=333E
textb = text with length maximum 15 characters. Only letters and
numbers.
id = 17 character long with this format XXXX-XXXX-XXX.XXX code - 3
or 4 character long based on W letter is presenting or not
texta = text with length maximum 15 characters. Only letters and
numbers.
I am trying to match these scenarios with this piece of code and groups
pattern ="
?(<textb>[\w\s]{15})#
?(<id>[\d\s]{17,17})#
(?<code>([A-Z]{0,1}\d{2,3}))
(?<wo>[W]{1})
?(<texta>[\w\s]{15})"
and
var textb = Regex.Match(mytext, pattern).Groups["textb"].Value;
var id= Regex.Match(mytext, pattern).Groups["id"].Value;
var code= Regex.Match(mytext, pattern).Groups["code"].Value;
var wo= Regex.Match(mytext, pattern).Groups["wo"].Value;
var texta= Regex.Match(mytext, pattern).Groups["texta"].Value;
A full example is "This is before text 234-1111-234.011#333E This is next text"
Not matching at all.

You could do it with one regular expression and then use Groups to get the parts you need.
void Main()
{
var input = "Before text 1234-1111-234.011#333E After text";
var pattern = #"(?<btext>[\w ]{0,15})(?<id>[\d\-\.]{17})#(?<code>[\d]{2,3})(?<wo>[A-Z]?)(?<atext>[\w ]{0,15})";
var matches = Regex.Match(input, pattern);
var btext = matches.Groups["btext"];
var wo = matches.Groups["wo"];
Console.WriteLine(btext.Value);
Console.WriteLine(wo.Value);
// etc.
}
(?<btext>[\w ]{0,15}) // Match letters, numbers and spaces, minimum 0 chars, maximum 15 chars
(?<id>[\d\-\.]{17}) // match numbers, '-' and '.'. Must be 17 chars
# // Match pound sign
(?<code>[\d]{2,3}) // Match numbers 2 or 3 chars long
(?<wo>[A-Z]?) // Match optional letter after code
(?<atext>[\w ]{0,15}) // Match letters, numbers and spaces, minimum 0 chars, maximum 15 chars

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!

For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?

I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.

For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Trying to match multiple words multiple times, any order using regex

I'm trying to check if a text contains two or more specific words. The words can be in any order an can show up in the text multiple times but at least once.
If the text is a match I will need to get the information about location of the words.
Lets say we have the text :
"Once I went to a store and bought a coke for a dollar and I got another coke for free"
In this example I want to match the words coke and dollar.
So the result should be:
coke : index 37, lenght 4
dollar : index 48, length 6
coke : index 84, length 4
What I have already is this: (which I think is little bit wrong because it should contain each word at least once so the + should be there instead of the *)
(?:(\bcoke\b))\*(?:(\bdollar\b))\*
But with that regex the RegEx Buddy highlights all the three words if I ask it to hightlight group 1 and group 2.
But when I run this in C# I won't get any results.
Can you point me to the right direction ?

I don't think it's possible what you want only using regular expressions.
Here is a possible solution using regular expressions and linq:
var words = new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "coke", "dollar" };
var regex = new Regex(#"\b(?:"+string.Join("|", words)+#")\b", RegexOptions.IgnoreCase);
var text = #"Once I went to a store and bought a coke
for a dollar and I got another coke for free";
var grouped = regex.Matches(text)
.OfType<Match>()
.GroupBy(m => m.Value, StringComparer.OrdinalIgnoreCase)
.ToArray();
if (grouped.Length != words.Count)
{
//not all words were found
}
else
{
foreach (var g in grouped)
{
Console.WriteLine("Found: " + g.Key);
foreach (var match in g)
Console.WriteLine(" At {0} length {1}", match.Index, match.Length);
}
}
Output:
Found: coke
At 36 length 4
At 72 length 4
Found: dollar
At 47 length 6

How about this, it is pret-tay bad but I think it has a shot at working and it is pure RegEx no extra tools.
(?:^|\W)[cC][oO][kK][eE](?:$|\W)|(?:^|\W)[dD][oO][lL][lL][aA][rR](?:$|\W)
Get rid of the \w's if you want it to capture cokeDollar or dollarCoKe etc.

Regular expression match all numbers after the last dash?

Trying to find the last instance of numbers after last dash in a string so
test-123-2-456 would return 456
123-test would return ""
123-test-456 would return 456
123-test-456sdfsdf would return 456
123-test-asd456 would return 456
The expression, #"[^-]*$", does not match the numbers though, and I have tried using [\d] but to no avail.

Sure, the simplest solution would be something like this:
(\d+)[^-]*$
This will match one or more digits, captured in group 1, followed by zero or more of any character other than a hyphen, followed by the end of the string. In other words, it will match any sequence of digits as long as there are no hyphens between that sequence and the end of the string. You then just have to extract group 1 from the match. For example:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
foreach(var str in inputs)
{
var m = Regex.Match(str, #"(\d+)[^-]*$");
Console.WriteLine("{0} --> {1}", str, m.Groups[1].Value);
}
Produces:
test-123-2-456 --> 456
123-test -->
123-test-456 --> 456
123-test-456sdfsdf --> 456
123-test-asd456 --> 456
Alternatively, if you could use a negative lookahead like this:
\d+(?!.*-)
This will match one or more digit characters so long as they are not followed by a hyphen. Only the digits will be included in the match.
Note that these two options behave differently if there are two or more sets of numbers after the last -, e.g. foo-123bar456. In this case it's not entirely clear what you want to happen, but the first pattern will simply match everything starting from the first sequence of digits to the end (123bar456) with group 1 only containing the first sequence of digits (123). If you'd like to change this so that it only captures the last sequence of digits, place a \d inside the character class (i.e. (\d+)[^\d-]*$). The second second pattern would produce a separate match for each sequence digits (in this example, 123 and 456) but the Regex.Match method will only give you the first match.

I suggest to apply two regex-functions. Take the result of the first one as the input for the second one.
The first regex is:
-[0-9]+[^-]+$ // Take the last peace of your string lead by a minus (-)
// followed by digits ([0-9]+)
// and some ugly rest that doesn't contain another minus ([^-]+$)
The second regex is:
-[0-9]+ // Seperate the relevant digits from the ugly rest
// You know that there can only be one minus + digits part in it
Tested here: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

The latest group from this RegEx can get the last number for you:
[^-A-z][0-9]+[^A-z]
If you are looking at groups, you can write this code by matching groups to get the latest number:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
var m = Regex.Match(str, #"([0-9]*)");
if(m.Groups.Length>1) //This will avoid the values starting with numbers only.
Console.WriteLine("{0} --> {1}", str, m.Groups[m.Groups.Length-1].Value);

Need multiple regular expression matches using C#

So I have this list of flight data and I need to be able to parse through it using regular expressions (this isn't the entire list).
1 AA2401 F7 A4 Y7 B7 M7 H7 K7 /DFW A LAX 4 0715 0836 E0.M80 9 3:21
2 AA2421 F7 A1 Y7 B7 M7 H7 K7 DFWLAX 4 1106 1215 E0.777 7 3:09
3UA:US6352 B9 M9 H9 K0 /DFW 1 LAX 1200 1448 E0.733 1:48
For example, I might need from the first line 1, AA, 2401, and so on and so on. Now, I'm not asking for someone to come up with a regular expression for me because for the most part I'm getting to where I can pretty much handle that myself. My issue has more to do with being able to store the data some where and access it.
So I'm just trying to initially just "match" the first piece of data I need, which is the line number '1'. My "pattern" for just getting the first number is: ".?(\d{1,2}).*" . The reason it's {1,2} is because obviously once you get past 10 it needs to be able to take 2 numbers. The rest of the line is set up so that it will definitely be a space or a letter.
Here's the code:
var assembly = Assembly.GetExecutingAssembly();
var textStreamReader = new StreamReader(
assembly.GetManifestResourceStream("FlightParser.flightdata.txt"));
List<string> lines = new List<string>();
do
{
lines.Add(textStreamReader.ReadLine());
} while (!textStreamReader.EndOfStream);
Regex sPattern = new Regex(#".?(\d{1,2}).*");//whatever the pattern is
foreach (string line in lines)
{
System.Console.Write("{0,24}", line);
MatchCollection mc = sPattern.Matches(line);
if ( sPattern.IsMatch(line))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
System.Console.WriteLine(mc[0].Groups[0].Captures);
System.Console.WriteLine(line);
}//end foreach
System.Console.ReadLine();
With the code I'm writing, I'm basically just trying to get '1' into the match collection and somehow access it and write it to the console (for the sake of testing, that's not the ultimate goal).

Your regex pattern includes an asterisk which matches any number of characters - ie. the whole line. Remove the "*" and it will only match the "1". You may find an online RegEx tester such as this useful.

Assuming your file is not actually formatted as you posted and has each of the fields separated by something, you can match the first two-digit number of the line with this regex (ignoring 0 and leading zeros):
^\s*([1-9]\d?)
Since it is grouped, you can access the matched part through the Groups property of the Match object.
var line = "12 foobar blah 123 etc";
var re = new Regex(#"^\s*([1-9]\d?)");
var match = re.Match(line);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value); // "12"
}
else
{
Console.WriteLine("No match");
}

The following expression matches the first digit, that you wanted to capture, in the group "First".
^\s*(?<First>\d{1})
I find this regular expression tool highly useful when dealing with regex. Give it a try.
Also set RegexOption to Multiline when you are making the match.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get full word from MatchCollection C# not group - c#

Related

How to match different scenarios with regex in c# and groups

Simplify regex code in C#: Add a space between a digit/decimal and unit

Trying to match multiple words multiple times, any order using regex

Regular expression match all numbers after the last dash?

Need multiple regular expression matches using C#

Categories

Resources