Regex string for phrase - c#

I am trying to extract a ticket number with the phrase "Ticket ID: (20 digit number)"
This phrase can be written as:
"Ticket ID: (20 digit number)"
"TicketID:(20 digit number)" - Spaces do not matter
Here is the regex string I am using that fails to work. I am trying to understand what I am doing wrong here. This regex should be looking for any phrases reguardless of space with the word Ticket followed by ID: followed by a 20 digit number of any kind.
Regex newExpression = new Regex(#"\bTicket\b.*\bID:\b.*\d{20}",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);

With this pattern you obtain directly the number since the lookbehind (?<=..) is just a check and is not in the match result:
Regex newExpression = new Regex(#"(?<=\bTicket\s*ID\s*:\s*)\d{20}",
RegexOptions.IgnoreCase);

A word boundary doesn’t happen after a :. Just use \s* to ignore spaces:
Regex newExpression = new Regex(#"Ticket\s*ID:\s*(\d{20})");
Now you can use newExpression.Match(someString).Groups[1].Value.

Just skip everything before the : or :, like
#".*\s*\d{20}"

Well, you can try this code. It captures the 20 digit number as named capture group in Regex:
var newExpression = new Regex(//#"\bTicket\b.*\bID:\b.*\d{20}",
#"Ticket\s*ID\:\s*(?<num>\(\d{20}\))",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);
var items=new List<string>();
var r=new Random();
var sb=new StringBuilder();
var i=0;
var formats=new []{"TicketID:({0})", "Ticket ID:({0})", "Ticket ID: ({0})",
"Ticket ID: ({0})"};
for(;i<5;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,4)],sb));
sb.Remove(0,20);
}
for(;i<10;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
for(;i<15;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
foreach(var s in items){
var m = newExpression.Match(s);
if(m.Success && m.Groups!=null && m.Groups.Count>0){
string.Format("{0} - {1}",s,m.Groups["num"].Value).Dump();
}
}
NOTE: This was ran in Linqpad.

Related

Extract number after specific word and first open parenthesis using regex in C#

i want to get a string from a sentence which starts with a word Id:
Letter received for the claim
Id: Sanjay Kumar (12345678 / NA123456789)
Dear Customer find the report
op: Id: Sanjay Kumar (12345678 / NA123456789)
Exp op: 12345678
Code
var regex = new Regex(#"[\n\r].*Id:\s*([^\n\r]*)");
var useridText = regex.Match(extractedDocContent).Value;
You can use
var regex = new Regex(#"(?<=Id:[^()]*\()\d+");
See the regex demo.
Details:
(?<=Id:[^()]*\() - a positive lookbehind that matches a location that is immediately preceded with Id: + zero or more chars other than ( and ) + (
\d+ - one or more digtis.
Consider also a non-lookbehind approach:
var pattern = #"Id:[^()]*\((\d+)";
var useridText = Regex.Match(extractedDocContent, pattern)?.Groups[1].Value;

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Regex for Backus-Naur Form

i'm trying to make a regex to match a string like:
i<A> | n<B> | <C>
It needs to return the values:
("i", "A")
("n", "B")
("", "C")
Currently i'm using the following regex:
^([A-Za-z0-9]*)\<(.*?)\>
but it only matches the first pair ("i", "A").
I can't find a way to fix it.
the ^ asserts position at start of a line so it will only check the beginning of each line if you remove that i should work
and add a ? for the empty value see example below
string pattern = #"([A-Za-z0-9]?)<(.?)>";
string input = #"i<A> | n<B> | <C>";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
If the | is part of the string and should be matched, you can make use of the captures property with 2 capture groups with the same name.
^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$
The pattern matches:
^ Start of string
(?<first>[A-Za-z0-9]*) Named group first, optionally match any of the listed ranges
<(?<second>[^<>]*)> Match < then start named group second and match any char except < and > and match >
(?: Non capture group
\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)> Match a | between whitespace chars and the same pattern for both named groups
)+ Close group and repeat 1+ times
$ End of string
See a .NET regex demo | C# demo
Then you could for example create Tuples out of the matches to create the pairs.
string str = "i<A> | n<B> | <C>";
MatchCollection matches = Regex.Matches(str, #"^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$");
foreach (Match match in matches)
{
match.Groups["first"].Captures
.Select(c => c.Value)
.Zip(match.Groups["second"].Captures.Select(c => c.Value), (x, y) => Tuple.Create(x, y))
.ToList()
.ForEach(t => Console.WriteLine("first: {0}, second: {1}", t.Item1, t.Item2));
}
Output
first: i, second: A
first: n, second: B
first: , second: C

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

Extract string in parentheses using c#

I have a string ---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---
I want to get the value within the [] (i.e. 1508825516987)
How can I get the value using Regex?
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
Source of explanation: Click
static void Main(string[] args)
{
string txt = "---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---";
Regex regex = new Regex(#"\[(.*?)\]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match match = regex.Match(txt);
if (match.Success)
{
for (int i = 1; i < match.Groups.Count; i++)
{
String extract = match.Groups[i].ToString();
Console.Write(extract.ToString());
}
}
Console.ReadLine();
}
Links to learn to create regular expressions:
Regexstorm
RegExr
Update 1:
Regex regex = new Regex(#"^---.*\[(.*?)\]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
^ is start of string
--- are your (start) characters
.* is any char between --- and [
You can use the following code to get the outcome what you want!
MatchCollection matches = Regex.Matches("---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---", #"\[(.*?)\]", RegexOptions.Singleline);
Match mat = matches[0];
string val = mat.Groups[1].Value.ToString();
whereas the string val will contain the value what you required.

Categories

Resources