Using a regex with 'or' operator and getting matched groups? - c#

I have some string in a file in the format
rid="deqn1-2"
rid="deqn3"
rid="deqn4-5a"
rid="deqn5b-7"
rid="deqn7-8"
rid="deqn9a-10v"
rid="deqn11a-12c"
I want a regex to match each deqnX-Y where X and Y are either both integers or both combination of integer and alphabet and if there is a match store X and Y in some variables.
I tried using the regex (^(\d+)-(\d+)$|^(\d+[a-z])-(\d+[a-z]))$
, but how do I get the values of the matched groups in variables?
For a match between two integers the groups would be (I think)
Groups[2].Value
Groups[3].Value
and for match between two integer and alphabet combo will be
Groups[4].Value
Groups[5].Value
How do I determine which match actually occured and then capture the matching groups accordingly?

As branch reset(?|) is not supported in C#, we can use named capturing group with same name like
deqn(?:(?<match1>\d+)-(?<match2>\d+)|(?<match1>\d+\w+)-(?<match2>\d+\w+))\b
regextester demo
C# code
String sample = "deqn1-2";
Regex regex = new Regex("deqn(?:(?<match1>\\d+)-(?<match2>\\d+)|(?<match1>\\d+\\w+)-(?<match2>\\d+\\w+))\\b");
Match match = regex.Match(sample);
if (match.Success) {
Console.WriteLine(match.Groups["match1"].Value);
Console.WriteLine(match.Groups["match2"].Value);
}
dotnetfiddle demo

You could simply not care. One of the pairs will be empty anyway. So what if you just interpret the result as a combination of both? Just slap them together. First value of the first pair plus first value of the second pair, and second value of the first pair plus second value of the second pair. This always gives the right result.
Regex regex = new Regex("^deqn(?:(\\d+)-(\\d+)|(\\d+[a-z])-(\\d+[a-z]))$");
foreach (String str in listData)
{
Match match = regex.Match(str);
if (!match.Success)
continue;
String value1 = Groups[1].Value + Groups[3].Value;
String value2 = Groups[2].Value + Groups[4].Value;
// process your strings
// ...
}

Related

Select dynamically a list of char in a string form a DLL

I'm new here and my English will not be the best you'll read today.
I just imported from a DLL a list of "key"
(#8yg54w-#95jz#e-##9ixop-#7ps-#ny#9qv-#+pzbk5-#bp669x-#bp6696-#bp6696-#bp6696-#bp6696-#bp6696-#fbhstu-#ehddtk-####9py),
we will name it this way it's a simple string.
I need to select the "key" that compose this string after each # but it has to be done dynamically and not like you choose in an ArrayList [0,1,2 ...].
The end result should look like 8yg54w and after u got this one it's a loop and u get the next one, which means 95jz#e. The first "#" is a separator for each key.
I wanna know how can I proceed to get each key after the first separator.
I'll try to answer your questions because I think that there will be some, this is probably poorly explained, I apologize in advance! Thanks
Your solution may be a simple function that returns an IEnumerable<string>. You can accomplish this by splitting the string and using the yield keyword to return an iterator. E.g.
// Define the splitting function
public IEnumerable<string> GetKeys(string source) {
var splitted = source.Split("-#");
foreach (var key in splitted)
yield return key;
}
// Use it in your code
var myKeys = GetKeys("#8yg54w-#95jz#e-##9ixop-#7ps-#ny#9qv-#+pzbk5-#bp669x-#bp6696-#bp6696-#bp6696-#bp6696-#bp6696-#fbhstu-#ehddtk-####9py");
foreach(var k in myKeys) {
// This will print your keys in the console one per line.
Console.WriteLine(k);
}
You can use this approach but I suggest to better hide the logic to get the nex Key if you need it to be a Unique Gobal Key generator. Using a static class with only a GetNextKey() method that can be the combination of the code above...
This should return an array of keys.
string.Split("-#");
When you just need the string:
string x = "(#8yg54w-#95jz#e-##9ixop-#7ps-#ny#9qv-#+pzbk5-#bp669x-#bp6696-#bp6696-#bp6696-#bp6696-#bp6696-#fbhstu-#ehddtk-####9py)";
Console.WriteLine(string.Join("-", x.Split("-#")));
You can use a Regular expression.
string input = "(#8yg54w-#95jz#e-##9ixop-#7ps-#ny#9qv-#+pzbk5-#bp669x-#bp6696-#bp6696-#bp6696-#bp6696-#bp6696-#fbhstu-#ehddtk-####9py)";
MatchCollection matches = Regex.Matches(input, #"(?<=\#)[A-Za-z1-9#]+(?=-)");
foreach (Match match in matches) {
Console.WriteLine(match.Value);
}
Output:
8yg54w
95jz#e
#9ixop
7ps
ny#9qv
bp669x
bp6696
bp6696
bp6696
bp6696
bp6696
fbhstu
ehddtk
Explanation of the regular expression (?<=\#)[A-Za-z1-9#]+(?=-)
General form (?<=prefix)find(?=suffix) finds the pattern find between a prefix and a suffix.
(?<=\#) prefix # (escaped with \).
[A-Za-z1-9#] character set to match (upper and lower case letters + digits + #).
+ quantifier: At leat one character.
(?=-) suffix -.
I am not sure whether the ) is part of string. To get the last key ###9py if the string contains ) use (?<=\#)[A-Za-z1-9#]+(?=-|\)) where \) is the right brace escaped. If ) is in there, use (?<=\#)[A-Za-z1-9#]+(?=-|$) where $ is the end of the string. | means OR. I.e., the suffix is either '-' OR ) or it is - OR $ (end of line).

Get a regex expression for a key value pair separated by comma

I would like to get the regex expression for a key-value pair separated by comma.
input: "tag.Name:test,age:30,name:TestName123"
This is my attempt so far
string pattern = #".*:.*"
(I guess that .* indicates anything multiple times, followed by : and again anything multiple times, if I include a comma at the end ,*
string pattern = #".*:.*,*"
I assume is the same thing, but it didn't work for me, the final result can be accomplish with Linq but I would like to not parse the input
A sample of my output
INPUT
string input = "tags.tagName:Tag1,tags.isRequired:false"
var finaRes = input.Split(',').Select(x => x.Split(':')).Select(x => new { Key = x.First(), Value= x.Last()});
OUTPUT:
Key Value
---------------|-------
tags.tagName | Tag1
tags.isRequired| false
You can use this regex (demo is here)
(?<key>[^:]+):(?<value>[^,]+),?
Explanation:
(?<key>[^:]+) // this will match a 'key' - everything until colon char
(?<value>[^,]+) // this will match a 'value' - everything until comma char
C# example:
var regex = new Regex("(?<key>[^:]+):(?<value>[^,]+),?");
var input = "tag.Name:test,age:30,name:TestName123";
var matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.Write(match.Groups["key"]);
Console.Write(" ");
Console.WriteLine(match.Groups["value"]);
}
Output will be:
tag.Name test
age 30
name TestName123
Demo
Something along the lines of /([^,]+):([^,]+)/g should be able to achieve this. Note that this will allow spaces in the keys and values.
This will match each key value pair, and each match will contain 2 groups, group 1 being the key and 2 being the value.
Here is a useful tool that you can use to see how it works and test: https://regex101.com/r/m5KVfu/2
If you are okay with a lengthy regex, this might be useful:
((([a-zA-Z]*[0-9]*)*):([a-zA-Z0-9]+))(,([a-zA-Z]+[0-9]*)+:([a-zA-Z0-9]+))*
(([a-zA-Z]*[0-9]*)*) key should start from alphabet and may or may not end with a digit.
:colon, the key-value separator
([a-zA-Z0-9]+): value may container alphanumeric characters in any order, + indicates value is must.
,([a-zA-Z]+[0-9]*)+:([a-zA-Z0-9]+))? the ? at the end indicates that second key-value pair may or may not exist.
,([a-zA-Z]+[0-9]*)+ the second key should start from alphabet and may or may not end with a digit., + at end indicates if there is a comma then there must be a second key.
:colon, the key-value separator
([a-zA-Z0-9]+): the mandatory value for the second key

Find String Between To Identical Control Separators?

I'm reading from a file, and need to find a string that is encapsulated by two identical non-ascii values/control seperators, in this case 'RS'
How would I go about doing this? Would I need some form of regex?
RS stands for Record Separator, and it has a value of 30 (or 0x1E in hexadecimal). You can use this regular expression:
\x1E([\w\s]*?)\x1E
That matches the RS, then matches any letter, number or space, and then again the RS. The ? is to make the regex match as less characters as possible, in case there are more RS characters afterwards.
If you prefer not to match numbers, you could use [a-zA-Z\s] instead of [\w\s].
Example:
string fileContents = "Something \u001Eyour string\u001E more things \u001Eanother text\u001E end.";
MatchCollection matches = Regex.Matches(fileContents, #"\x1E([\w\s]*?)\x1E");
if (matches.Count == 0)
return; // Not found, display an error message and exit.
foreach (Match match in matches)
{
if (match.Groups.Count > 1)
Console.WriteLine(match.Groups[1].Value);
}
As you can see, you get a collection of Match, and each match.Value will have the whole matched string including the separators. match.Groups will have all matched groups, being the first one again the whole matched string (that's by default) and then each of your groups (those between parenthesis). In this case, you only have one in your regex, so you just need the second one on that list.
Using regex you can do something like this:
string pattern = string.Format("{0}(.*){1}",firstString,secondString);
var matches = Regex.Matches(myString, pattern);
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
//Do stuff, with the current you should remove firstString and secondString from the capture.Value
}
}
After that use Regex.match to find the string that match with the pattern built before.
Remember to escape all the special char for regex.
You can use Regex.Matches, I'm using X as the separator in this example:
var fileContents = "Xsomething1X Xsomething2X Xsomething3X";
var results = Regex.Matches(fileContents, #"(X).*?(\1)");
The you can loop on results to do anything you want with the matches.
The \1 in the regex means "reference first group". I've put X between () so it is going to be group 1, the I use \1 to say that the match in this place should be exactly the same as the group 1.
You don't need a regular expression for that.
Read the contents of the file (File.ReadAllText).
Split on the separator character (String.Split).
If you know there's only one occurrence of your string, take the second array element (result[1]). Otherwise, take every other entry (result.Where((x, i) => i % 2 == 1)).

Regular expression match substring

I tried to create a regular expression which pulls everything that matches:
[aA-zZ]{2}[0-9]{5}
The problem is that I want to exclude from matching when I have eg. ABCD12345678
Can anyone help me resolve this?
EDIT1:
I am looking two letters and five digits in the string, but I want to exclude from matching when I have string like ABCD12345678, because when I use above regular expression it will return CD12345.
EDIT2:
I didn't check everything but I think I found answer:
WHEN field is null then field
WHEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}') = 'N/A' THEN field
WHEN field like '%[^a-z][a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' or field like '[a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' THEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}')
ELSE field
First [aA-zZ] haven't any sense, second use word boundaries:
\b[a-zA-Z]{2}[0-9]{5}\b
You could also use case insensitive modifier:
(?i)\b[a-z]{2}[0-9]{5}\b
According to your comment, it seems you may have underscore after the five digits. In this case, word boundary doesn't work, you have to use ths instead:
(?i)(?<![a-z])([a-z]{2}[0-9]{5})(?![0-9])
(?<![a-z]) is a negative lookbehind that assumes you haven't a letter before the two that are mandatory
(?![0-9]) is a negative lookahead that assumes you haven't a digit after the five that are mandatory
This would be the code, along with usage samples.
public static Regex regex = new Regex(
"\\b[a-zA-Z]{2}\\d{5}\\b",
RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);
//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);
//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);
//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);
//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);
//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();
//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

How to do this Regex in C#?

I've been trying to do this for quite some time but for some reason never got it right.
There will be texts like these:
12325 NHGKF
34523 KGJ
29302 MMKSEIE
49504EFDF
The rule is there will be EXACTLY 5 digit number (no more or less) after that a 1 SPACE (or no space at all) and some text after as shown above. I would like to have a MATCH using a regex pattern and extract THE NUMBER and SPACE and THE TEXT.
Is this possible? Thank you very much!
Since from your wording you seem to need to be able to get each component part of the input text on a successful match, then here's one that'll give you named groups number, space and text so you can get them easily if the regex matches:
(?<number>\d{5})(?<space>\s?)(?<text>\w+)
On the returned Match, if Success==true then you can do:
string number = match.Groups["number"].Value;
string text = match.Groups["text"].Value;
bool hadSpace = match.Groups["space"] != null;
The expression is relatively simple:
^([0-9]{5}) ?([A-Z]+)$
That is, 5 digits, an optional space, and one or more upper-case letter. The anchors at both ends ensure that the entire input is matched.
The parentheses around the digits pattern and the letters pattern designate capturing groups one and two. Access them to get the number and the word.
string test = "12345 SOMETEXT";
string[] result = Regex.Split(test, #"(\d{5})\s*(\w+)");
You could use the Split method:
public class Program
{
static void Main()
{
var values = new[]
{
"12325 NHGKF",
"34523 KGJ",
"29302 MMKSEIE",
"49504EFDF"
};
foreach (var value in values)
{
var tokens = Regex.Split(value, #"(\d{5})\s*(\w+)");
Console.WriteLine("key: {0}, value: {1}", tokens[1], tokens[2]);
}
}
}

Categories

Resources