Extract string in parentheses using c#

Extract string in parentheses using c# - c#

I have a string ---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---
I want to get the value within the [] (i.e. 1508825516987)
How can I get the value using Regex?

Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
Source of explanation: Click
static void Main(string[] args)
{
string txt = "---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---";
Regex regex = new Regex(#"\[(.*?)\]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match match = regex.Match(txt);
if (match.Success)
{
for (int i = 1; i < match.Groups.Count; i++)
{
String extract = match.Groups[i].ToString();
Console.Write(extract.ToString());
}
}
Console.ReadLine();
}
Links to learn to create regular expressions:
Regexstorm
RegExr
Update 1:
Regex regex = new Regex(#"^---.*\[(.*?)\]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
^ is start of string
--- are your (start) characters
.* is any char between --- and [

You can use the following code to get the outcome what you want!
MatchCollection matches = Regex.Matches("---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]---", #"\[(.*?)\]", RegexOptions.Singleline);
Match mat = matches[0];
string val = mat.Groups[1].Value.ToString();
whereas the string val will contain the value what you required.

Related

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin

You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.

A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!

I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"

A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

C# Regex to match all occurrences of a pattern and replace with empty string

I am trying to match a pattern <two alpha chars>single space<two digits>single space<two digits>and remove in all occurrences in a string.
var myRegex = #"(?:^|[\s]|[, ]|[.]|[\n]|[\t])([A-Za-z]{2}\s[0-9]{2}\s[0-9]{2})($|[,]|[.]|[\s]|[\n]|[\t])";
string myString = "this 02 34, HU 23 76 , hh 76 745 1.HO 12 33. HO 34 56";
var matches = Regex.Matches(myString, myRegex);
foreach (Match match in matches)
{
myString = myString.Replace(match.Value, "");
}
In above variable myString "this 02 34" will not match as there is no space or period or comma or new line or tab. This is expected behavior.
But "HO 34 56" is not matching as it is not ending with space or period or comma or new line or tab. How can I include this in the match and not have a match for "hh 76 745"
After executing above code, I expect myString variable to have "this 02 34, , hh 76 745 1.. "

Use this regex with word boundaries:
\b[A-Za-z]{2}\s[0-9]{2}\s[0-9]{2}\b
See the regex demo
Details:
\b - a leading word boundary
[A-Za-z]{2} - 2 alpha
\s - a whitespace
[0-9]{2} - 2 digits
\s - a whitespace
[0-9]{2} - 2 digits
\b - a trailing word boundary.
If you need to say "not preceded with alpha" replace the first \b with (?<![a-zA-Z]) and if you want to say "not followed with digit" replace the last \b with (?!\d). That is, use lookarounds, that, like word boundaries, are zero-width assertions.
If you really after matching that chunk when it has leading or trailer with following space or period or comma or new line or tab or beginning of string or end of string, use
(?<=^|[\s,.])[A-Za-z]{2}\s[0-9]{2}\s[0-9]{2}(?=$|[\s,.])
See this demo

need help in finding pattern in string using Regular Expression using C#

I have a string in following format..
"ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00";
What I am trying to do is find the next group which starts with pipe but is not followed by a -
So the above string will point to 3 sections such as
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00
I played around with following code but it doesn't seem to do anything, it is not giving me the position of the next block where pipe char is which is not followed by a dash (-)
String pattern = #"^+|[A-Z][A-Z][A-Z]$";
In the above my logic is
1:Start from the beginning
2:Find a pipe character which is not followed by a dash char
3:Return its position
4:Which I will eventually use to substring the blocks
5:And do this till the end of the string
Pls be kind as I have no idea how regex works, I am just making an attempt to use it. Thanks, language is C#

You can use the Regex.Split method with a pattern of \|(?!-).
Notice that you need to escape the | character since it's a metacharacter in regex that is used for alternatation. The (?!-) is a negative look-ahead that will stop matching when a dash is encountered after the | character.
var pattern = #"\|(?!-)";
var results = Regex.Split(input, pattern);
foreach (var match in results) {
Console.WriteLine(match);
}

My Regex logic for this was:
the delimiter is pipe "[|]"
we will gather a series of characters that are not our delimiter
"(" not our delimiter ")" but at least one character "+"
"[^|]" is not our delimiter
"[|][-]" is also not our delimiter
Variable "pattern" could use a "*" instead of "+" if empty segments are acceptable. The pattern ends with a "?" since our final string segment (in your example) does not have a pipe character.
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace ConsoleTest1
{
class Program
{
static void Main(string[] args)
{
var input = "ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00";
var pattern = "([^|]|[|][-])+[|]?";
Match m;
m = Regex.Match(input, pattern);
while (m.Success) {
Debug.WriteLine(String.Format("Match from {0} for {1} characters", m.Index, m.Length));
Debug.WriteLine(input.Substring(m.Index, m.Length));
m = m.NextMatch();
}
}
}
}
Output is:
Match from 0 for 50 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00|
Match from 50 for 49 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC11.00|
Match from 99 for 49 characters
ABC 12.23-22-22-11|-ABC 33.20-ABC 44.00-ABC 11.00

How to trim whitespaces inside regex replacement string

I have a regex match string as;
public static string RegExMatchString = "(?<NVE>.{20})(?<SN>.{20})(?<REGION>.{4})(?<YY>\\d{4})(?<Mo" +
"n>\\d{2})(?<DD>\\d{1,2})(?<HH>\\d{2})(?<Min>\\d{2})(?<SS>\\d" +
"{2}).{6}(?<USER>.{10})(?<SCANTYPE>.{2})(?<IN>.{4})(?<OU" +
"T>.{4})(?<DISPO>.{2})(?<ROUTE>.{7})(?<LP>.{16})(?<POOL>.{3})" +
"(?<CONT>.{9})(?<REGION_L>.{18})(?<CAT>.{2})";
And I'm replacing it as
public string RegExReplacementString = "LogBarcodeID ( \"${NVE}\", ID2: \"${SN}\", Scanner: \"${USER}" +
"\", AreaName: \"${REGION_L}${CAT}${SCANTYPE}\", TimeStamp: \"${YY}/${Mon}/${D" +
"D} ${HH}:${Min}:${SS} \") ";
I need to remove all trailing and preceding whitespaces from these three variable;
${REGION_L}
${CAT}
${SCANTYPE}
How should I change RegExReplacementString (or maybe RegExMatchString) so that this can be achieved?
Sample input is:
0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F
Currently I'm getting related part as
AreaName: "135 F61" however I need to get AreaName: "135F61"
EDIT:
I'm reading regex match string from text file. And initing regex ;
RegExMatchString = File.ReadAllText(regexMatchStringPath);
regex = new Regex( RegExMatchString ,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
);
string replaced = regex.Replace("0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F", RegExReplacementString);

I think the fixed length info of each field would be useful to solve the problem here.
use a regex like : "^(.{20})(.{10})(.{2})(.{2})(.{2})$" to isolate each field.
This is for an example with 5 fields that you know are of
Length 20, Length 10, Length 2, Length 2, Length 2.
then use some LINQ and C# to get a list of (trimmed) fields.
Example :
var testRegex = "^(.{20})(.{10})(.{2})(.{2})(.{2})$";
var testData = "Field of length 20 FieldLen10123456";
var fields = Regex.Match(testData, testRegex).Groups.Cast<Group>().Skip(1).Select(i => i.Value.Trim());

Regex string for phrase

I am trying to extract a ticket number with the phrase "Ticket ID: (20 digit number)"
This phrase can be written as:
"Ticket ID: (20 digit number)"
"TicketID:(20 digit number)" - Spaces do not matter
Here is the regex string I am using that fails to work. I am trying to understand what I am doing wrong here. This regex should be looking for any phrases reguardless of space with the word Ticket followed by ID: followed by a 20 digit number of any kind.
Regex newExpression = new Regex(#"\bTicket\b.*\bID:\b.*\d{20}",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);

With this pattern you obtain directly the number since the lookbehind (?<=..) is just a check and is not in the match result:
Regex newExpression = new Regex(#"(?<=\bTicket\s*ID\s*:\s*)\d{20}",
RegexOptions.IgnoreCase);

A word boundary doesn’t happen after a :. Just use \s* to ignore spaces:
Regex newExpression = new Regex(#"Ticket\s*ID:\s*(\d{20})");
Now you can use newExpression.Match(someString).Groups[1].Value.

Just skip everything before the : or :, like
#".*\s*\d{20}"

Well, you can try this code. It captures the 20 digit number as named capture group in Regex:
var newExpression = new Regex(//#"\bTicket\b.*\bID:\b.*\d{20}",
#"Ticket\s*ID\:\s*(?<num>\(\d{20}\))",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);
var items=new List<string>();
var r=new Random();
var sb=new StringBuilder();
var i=0;
var formats=new []{"TicketID:({0})", "Ticket ID:({0})", "Ticket ID: ({0})",
"Ticket ID: ({0})"};
for(;i<5;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,4)],sb));
sb.Remove(0,20);
}
for(;i<10;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
for(;i<15;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
foreach(var s in items){
var m = newExpression.Match(s);
if(m.Success && m.Groups!=null && m.Groups.Count>0){
string.Format("{0} - {1}",s,m.Groups["num"].Value).Dump();
}
}
NOTE: This was ran in Linqpad.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract string in parentheses using c# - c#

I have a string ---TIMESTAMP Tue, 24 Oct 2017 02:11:56 -0400 [1508825516987]--- I want to get the value within the [] (i.e. 1508825516987) How can I get the value using Regex?

Related

Removal of colon and carriage returns and replace with colon

C# Regex to match all occurrences of a pattern and replace with empty string

need help in finding pattern in string using Regular Expression using C#

How to trim whitespaces inside regex replacement string

Regex string for phrase

Categories

Resources