Reg ex start with fixed string and end with variable number - c#

I need to match these strings that start with fixed string These masters were created using and end with variable [space][name][space][char][number]
These masters were created using Kevin O014
These masters were created using Jhon A039
These masters were created using Geeth P034
These masters were created using Jemes M077
These masters were created using Anne H058
These masters were created using JANE S345
Any idea?
I tried this ^(These masters were created using).\s.[a-zA-Z].\s.[a-zA-Z].[0-9]{3}.$. it gooks greek to me

I would use this:
These masters were created using [a-zA-Z]+ [a-zA-Z]\d+
See demo
[a-zA-Z]+ To match a name (assuming, simple names, no -, no accentuated char)
[a-zA-Z]\d+ to match a letter followed by any digit. You might change to [a-zA-Z]\d{3} if you need exactly 3 digits
string input = #" </span>These masters were created using Smith J054<br>";
Regex regex = new Regex(#"These masters were created using [a-zA-Z]+ ([a-zA-Z]\d+)");
foreach (Match match in regex.Matches(input))
{
Console.Out.WriteLine("Found a match : " + match);
if(match.Groups.Count >= 2)
Console.Out.WriteLine("Extract " + match.Groups[1].Value);
}
Output:
Found a match : These masters were created using Smith J054
Extract J054

^These masters were created using [[a-zA-z\S]* [[A-Za-z0-9]*$
Made sure it matches using multiline on an online calculator.

To not only validate, but also to get the names and codes at the end, you can use
\bThese masters were created using (?<Name>[A-Za-z]+)\s+(?<Code>[A-Za-z][0-9]{3})\b
See the regex demo
where (?<Name>[A-Za-z]+) matches any 1+ ASCII letters and captures it into Group with name "Name", \s+ matches one or more whitespaces, (?<Code>[A-Za-z][0-9]{3}) matches and captures into Group "Code" a letter followed with 3 digits exactly (use + instead of {3} to match 1 or more digits).
Note that \b are word boundaries that help match the strings inside larger strings as whole words.
In C#:
var pattern = #"\bThese masters were created using (?<Name>\p{L}+)\s+(?<Code>\p{L}\d{3})\b";
Sample C# demo:
using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var s = "aaaaaa These masters were created using Kevin O014\nThese masters were created using Jhon A039\nThese masters were created using Geeth P034\nThese masters were created using Jemes M077\nThese masters were created using Anne H058\nThese masters were created using JANE S345 aaaaa";
var matches = Regex.Matches(s, #"\bThese masters were created using (?<Name>[A-Za-z]+)\s+(?<Code>[A-Za-z][0-9]{3})\b")
.Cast<Match>()
.Select(m=> string.Format("{0}: {1}", m.Groups["Name"].Value, m.Groups["Code"].Value))
.ToList();
Console.WriteLine(string.Join("\n", matches));
}
}

Related

Split the String and get different output

I have 2 different strings need to split them get the output as I needed. Trying different solutions didnt work for me and blocked
Input
"12.2 - Chemicals, products and including,14.0 - Plastic products ,17.2 - Metal Products ,19.1 - and optical equipment (excluding and other electronic components; semiconductors; bare printed circuit boards; opti, watches)"
OutPut
"12.2, 14.0, 17.2, 19.1"
The other case. [UpDated as per new codes]
Input
"MD 0102.2.3 - Test 123 ,MD 0102.2.5 - Test Hello ,MD 1101.2 - Dialysis and blood treatment, MDM 123 - Test, MDM 12.32.0 - Test 12"
Output
"MD 0102.2.3, MD 0102.2.5,MD 1101.2,MDM 123,MDM 12.32.0"
didn't understand which logic I need to find it.
You can solve both tasks with a help of regular expressions and Linq. The only difference is in the patterns (fiddle):
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
...
string input = "12.2 - Chemicals, products and including,14.0 - Plastic products ,17.2 - Metal Products ,19.1 - and optical equipment ...";
string pattern = #"[0-9]{1,2}\.[0-9]";
string[] result = Regex
.Matches(input, pattern)
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(", ", result));
Note, it is pattern which differ
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
...
string input = "DM 0405 - trtststodfm, fhreuu ,RD 3756 - yeyerffydydyd";
string pattern = #"[A-Z]{2}\s[0-9]{4}";
string[] result = Regex
.Matches(input, pattern)
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(", ", result));
Patterns explained:
[A-Z]{2}\s[0-9]{4}
[A-Z]{2} - 2 capital English letters in A..Z range
\s - white space
[0-9]{4} - 4 digits in 0..9 range
[0-9]{1,2}\.[0-9]
[0-9]{1,2} - from 1 up to 2 digits in 0..9 range
\. - dot .
[0-9] - digit in 0..9 range
If the input is always in the combination of {id}-{item},{id}-{item}.
I would split the string on the ',' character. After you've done that it would be quick if you search through the collection with Linq and Regex.
But you would need to know in what formats the ID of the item can have.
if it is like
2.2, 14.0, 17.2, 19.1
and does not change. Then a simple Regex like beneath suffices, which you can use in your Linq query.
new Regex(#"(\d*\.\d*)")
You could use this regex: ((\w{2} \d{4})|\d+\.\d+)(?=( - ))
Here's a fiddle demonstrating it: https://dotnetfiddle.net/0pTqFQ
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string input1 = "12.2 - Chemicals, products and including,14.0 - Plastic products ,17.2 - Metal Products ,19.1 - and optical equipment (excluding and other electronic components; semiconductors; bare printed circuit boards; opti, watches)";
string input2 = "DM 0405 - trtststodfm, fhreuu ,RD 3756 - yeyerffydydyd";
var regex1 = new Regex(#"((\w{2} \d{4})|\d+\.\d+)(?=( - ))");
var matches1 = regex1.Matches(input1);
var matches2 = regex1.Matches(input2);
PrintMatches(matches1);
PrintMatches(matches2);
}
private static void PrintMatches(MatchCollection matches)
{
foreach (var match in matches)
{
Console.WriteLine(match);
}
}
}

C# Replace and Remove text

I am having a little problem with how to replace and remove the text from the label.
label1.Text = Users online: 1 browsing: 1 pages
I am using gethtmldocument to receive the label1.Text to be like above. My problem is I want the text to show only Users Online: (number).
Now I am using label1.Text.Remove(17). So I will get Users online: 1 but the problem is when the users exceed the limit is 10 the text will count to 1 again not 10.
And I am trying to use label1.Text.replace("browsing: 1 pages",""). But when user is online the browsing: 1 pages will change to browsing: 2 pages or others.
So my question is how can I receive the text only Users online: ???
Thank you.
Try using regular expressions: match the groups and represent them in the desired way:
using System.Text.RegularExpressions;
...
string source = "Users online: 479 browsing: 153 pages";
// match.Groups["text"] - "Users online: "
// match.Groups["number"] - "479"
var match = Regex.Match(source, "^(?<text>.*?)(?<number>[0-9]+)");
// Users online: (479)
label1.Text = $"{match.Groups["text"].Value.Trim()} ({match.Groups["number"].Value})";
Edit: Regular expression's pattern ^(?<text>.*?)(?<number>[0-9]+) explanation:
^ - anchor: string's beginning
(?<text> ...) - group named "text" which contains
.*? - any characters, as few as possible
(?<number> ...) - group named "number" which contains
[0-9]+ - digits (char in [0..9] range); "+" - at least one
You could try to use substring. Something like this:
var x = //get the text
var textToDisplay = x.Substring(0, x.IndexOf("b");
Label1.Text = textToDisplay;

need help in finding pattern in string using Regular Expression using C#

I have a string in following format..
"ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00";
What I am trying to do is find the next group which starts with pipe but is not followed by a -
So the above string will point to 3 sections such as
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
I played around with following code but it doesn't seem to do anything, it is not giving me the position of the next block where pipe char is which is not followed by a dash (-)
String pattern = #"^+|[A-Z][A-Z][A-Z]$";
In the above my logic is
1:Start from the beginning
2:Find a pipe character which is not followed by a dash char
3:Return its position
4:Which I will eventually use to substring the blocks
5:And do this till the end of the string
Pls be kind as I have no idea how regex works, I am just making an attempt to use it. Thanks, language is C#
You can use the Regex.Split method with a pattern of \|(?!-).
Notice that you need to escape the | character since it's a metacharacter in regex that is used for alternatation. The (?!-) is a negative look-ahead that will stop matching when a dash is encountered after the | character.
var pattern = #"\|(?!-)";
var results = Regex.Split(input, pattern);
foreach (var match in results) {
Console.WriteLine(match);
}
My Regex logic for this was:
the delimiter is pipe "[|]"
we will gather a series of characters that are not our delimiter
"(" not our delimiter ")" but at least one character "+"
"[^|]" is not our delimiter
"[|][-]" is also not our delimiter
Variable "pattern" could use a "*" instead of "+" if empty segments are acceptable. The pattern ends with a "?" since our final string segment (in your example) does not have a pipe character.
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace ConsoleTest1
{
class Program
{
static void Main(string[] args)
{
var input = "ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00";
var pattern = "([^|]|[|][-])+[|]?";
Match m;
m = Regex.Match(input, pattern);
while (m.Success) {
Debug.WriteLine(String.Format("Match from {0} for {1} characters", m.Index, m.Length));
Debug.WriteLine(input.Substring(m.Index, m.Length));
m = m.NextMatch();
}
}
}
}
Output is:
Match from 0 for 50 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|
Match from 50 for 49 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|
Match from 99 for 49 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00

Why does this loop through Regex groups print the output twice?

I have written this very straight forward regex code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace RegexTest1
{
class Program
{
static void Main(string[] args)
{
string a = "\"foobar123==\"";
Regex r = new Regex("^\"(.*)\"$");
Match m = r.Match(a);
if (m.Success)
{
foreach (Group g in m.Groups)
{
Console.WriteLine(g.Index);
Console.WriteLine(g.Value);
}
}
}
}
}
However the output is
0
"foobar123=="
1
foobar123==
I don't understand why does it print twice. why should there be a capture at index 0? when I say in my regex ^\" and I am not using capture for this.
Sorry if this is very basic but I don't write Regex on a daily basis.
According to me, this code should print only once and the index should be 1 and the value should be foobar==
This happens because group zero is special: it returns the entire match.
From the Regex documentation (emphasis added):
A simple regular expression pattern illustrates how numbered (unnamed) and named groups can be referenced either programmatically or by using regular expression language syntax. The regular expression ((?<One>abc)\d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and by name. The first capturing group (number 0) always refers to the entire pattern.
# Name Group
- ---------------- --------------------------------
0 0 (default name) ((?<One>abc)\d+)?(?<Two>xyz)(.*)
1 1 (default name) ((?<One>abc)\d+)
2 2 (default name) (.*)
3 One (?<One>abc)
4 Two (?<Two>xyz)
If you do not want to see it, start the output from the first group.
A regex captures several groups at once. Group 0 is the entire matched region (including the accents). Group 1 is the group defined by the brackets.
Say your regex has the following form:
A(B(C)D)E.
With A, B, C, D end E regex expressions.
Then the following groups will be matched:
0 A(B(C)D)E
1 B(C)D
2 C
The i-th group starts at the i-th open bracket. And you can say the "zero-th" open bracket is implicitly placed at the begin of the regex (and ends at the end of the regex).
If you want to omit group 0, you can use the Skip method of the LINQ framework:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace RegexTest1 {
class Program {
static void Main(string[] args) {
string a = "\"foobar123==\"";
Regex r = new Regex("^\"(.*)\"$");
Match m = r.Match(a);
if (m.Success) {
foreach (Group g in m.Groups.Skip(1)) {//Skipping the first (thus group 0)
Console.WriteLine(g.Index);
Console.WriteLine(g.Value);
}
}
}
}
}
0
"foobar123==" -- Matched string.
Entire match by a pattern would be found at index 0.
1
foobar123== -- Captured string.
group index 1 contains the characters which are captured by the first capturing group.
Using #dasblinkenlight regex as an example...
This is not the whole story with Dot-Net capture group counting.
As named groups are added, the default is count them and count them last.
These can optionally be changed.
Of course group 0 always contain the entire match. Group counting really starts at 1
because you can't specify a back reference (in the regex) to group 0, it conflicts
with the binary construct \0000.
Here is counting with named/normal groups in the Dot-Net the default state.
( # (1 start)
(?<One> abc ) #_(3)
\d+
)? # (1 end)
(?<Two> xyz ) #_(4)
( .* ) # (2)
Here it is with names last turned OFF.
( # (1 start)
(?<One> abc ) # (2)
\d+
)? # (1 end)
(?<Two> xyz ) # (3)
( .* ) # (4)
Here it is with named counting turned OFF.
( # (1 start)
(?<One> abc )
\d+
)? # (1 end)
(?<Two> xyz )
( .* ) # (2)
You can return only one by removing the group 1 using ?:
Regex r = new Regex("^\"(?:.*)\"$");
Online Demo
Every time you use () you are creating groups and you can reference them later using back references $1,$2,$3 of course in the case of your expression simpler will be:
Regex r = new Regex("^\".*\"$");
Which is not using parenthesis at all

Regex to do not match certain sequence

I have a text file as below:
1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..
so i want a regex to get 4 matches in this case for each point. My regex doesn't work as I wish. Please, advice:
private readonly Regex _reactionRegex = new Regex(#"(\d+)\.(\d+)\s*-\s*(.+)", RegexOptions.Compiled | RegexOptions.Singleline);
even this regex isn't very helpful:
(\d+)\.(\d+)\s*-\s*(.+)(?<!\d+\.\d+)
Alex, this regex will do it:
(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)
This is assuming that you want to capture the point, without the numbers, for instance: just Hello
If you want to also capture the digits, for instance 1.1 - Hello, you can use the same regex and display the entire match, not just Group 1. The online demo below will show you both.
How does it work?
The idea is to capture the text you want to Group 1 using (parentheses).
We match in multi-line mode m to allow the anchor ^ to work on each line.
We match in dotall mode s to allow the dot to eat up strings on multiple lines
We use a negative lookahead (?! to stop eating characters when what follows is the beginning of the line with your digit marker
Here is full working code and an online demo.
using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program {
static void Main() {
string yourstring = #"1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..";
var resultList = new StringCollection();
try {
var yourRegex = new Regex(#"(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)");
Match matchResult = yourRegex.Match(yourstring);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
Console.WriteLine("Whole Match: " + matchResult.Value);
Console.WriteLine("Group 1: " + matchResult.Groups[1].Value + "\n");
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
This may do for what you're looking for, though there is some ambiguity of the expected result.
(\d+)\.(\d+)\s*-\s*(.+?)(\n)(?>\d|$)
The ambiguity is for example what would you expect to match if data looked like:
1.1 - Hello
1.2 - world!
2.1 - Some
data here and it contains some
32 digits so i cannot use \D+
2.2 - Etc..
Not clear if 32 here starts a new record or not.

Categories

Resources