How to trim whitespaces inside regex replacement string - c#

I have a regex match string as;
public static string RegExMatchString = "(?<NVE>.{20})(?<SN>.{20})(?<REGION>.{4})(?<YY>\\d{4})(?<Mo" +
"n>\\d{2})(?<DD>\\d{1,2})(?<HH>\\d{2})(?<Min>\\d{2})(?<SS>\\d" +
"{2}).{6}(?<USER>.{10})(?<SCANTYPE>.{2})(?<IN>.{4})(?<OU" +
"T>.{4})(?<DISPO>.{2})(?<ROUTE>.{7})(?<LP>.{16})(?<POOL>.{3})" +
"(?<CONT>.{9})(?<REGION_L>.{18})(?<CAT>.{2})";
And I'm replacing it as
public string RegExReplacementString = "LogBarcodeID ( \"${NVE}\", ID2: \"${SN}\", Scanner: \"${USER}" +
"\", AreaName: \"${REGION_L}${CAT}${SCANTYPE}\", TimeStamp: \"${YY}/${Mon}/${D" +
"D} ${HH}:${Min}:${SS} \") ";
I need to remove all trailing and preceding whitespaces from these three variable;
${REGION_L}
${CAT}
${SCANTYPE}
How should I change RegExReplacementString (or maybe RegExMatchString) so that this can be achieved?
Sample input is:
0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F
Currently I'm getting related part as
AreaName: "135 F61" however I need to get AreaName: "135F61"
EDIT:
I'm reading regex match string from text file. And initing regex ;
RegExMatchString = File.ReadAllText(regexMatchStringPath);
regex = new Regex( RegExMatchString ,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
);
string replaced = regex.Replace("0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F", RegExReplacementString);

I think the fixed length info of each field would be useful to solve the problem here.
use a regex like : "^(.{20})(.{10})(.{2})(.{2})(.{2})$" to isolate each field.
This is for an example with 5 fields that you know are of
Length 20, Length 10, Length 2, Length 2, Length 2.
then use some LINQ and C# to get a list of (trimmed) fields.
Example :
var testRegex = "^(.{20})(.{10})(.{2})(.{2})(.{2})$";
var testData = "Field of length 20 FieldLen10123456";
var fields = Regex.Match(testData, testRegex).Groups.Cast<Group>().Skip(1).Select(i => i.Value.Trim());

Related

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

c# Regex - Only get numbers and whitespaces in one string and only text and whitespaces in another

How do I only get numbers and include whitespaces in one string and only text and white spaces in another?
Iv'e tried this:
string value1 = "123 45 New York";
string result1 = Regex.Match(value1, #"^[\w\s]*$").Value;
string value2 = "123 45 New York";
string result2 = Regex.Match(value2, #"^[\w\s]*$").Value;
result1 need to be "123 45"
result2 need to be " New York"
Try next code:
string value1 = "123 45 New York";
string digitsAndSpaces = Regex.Match(value1, #"([0-9 ]+)").Value;
string value2 = "123 45 New York";
string lettersAndSpaces = Regex.Match(value2, #"([A-Za-z ])+([A-Za-z ]+)").Value;
Update:
How do I allow charachters like å ä ö in result from value2?
string value3 = "å ä ö";
string speclettersAndSpaces = Regex.Match(value3, #"([a-zÀ-ÿ ])+([a-zÀ-ÿ ]+)").Value;
The fallowing regex will allow only digits and spaces between them, the same goes with characters.
Regex: (?:\d[0-9 ]*\d)|(?:[A-Za-z][A-Za-z ]*[A-Za-z])
Details:
(?:) Non-capturing group
\d matches a digit (equal to [0-9])
[] Match a single character present in the list
* Matches between zero and unlimited times
| or
Output:
Match 1
Full match 0-6 `123 45`
Match 2
Full match 7-15 `New York`
Regex demo

Regex with optional matching groups

I'm trying to parse given string which is kind a of path separated with /. I need to write regex that would match each segment in the path to corresponding regex group.
Example 1:
input:
/EAN/SomeBrand/appliances/refrigerators/RF444
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value: refrigerators
Group: product, Value: RF4441
Example 2:
input:
/EAN/SomeBrand/appliances
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value:
Group: product, Value:
I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2).
static void Main()
{
var pattern = #"^" + #"/EAN"
+ #"/" + #"(?<producer>.+)"
+ #"/" + #"(?<category>.+)"
+ #"/" + #"(?<subcategory>.+)"
+ #"/" + #"(?<product>.+)?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var result = rgx.Match(#"/EAN/SomeBrand/appliances/refrigerators/RF444");
foreach (string groupName in rgx.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
result.Groups[groupName].Value);
}
Console.ReadLine();
}
Any suggestion is welcome. Unfortunately I cannot simply split the string since the framework I'm using expects regex object.
You can use optional groups (...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+:
^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
^ ^^^ ^^
See the regex demo
This is how you need to declare your regex in the C# code:
var pattern = #"^" + #"/EAN"
+ #"/(?<producer>[^/]+)"
+ #"/(?<category>[^/]+)"
+ #"(/(?<subcategory>[^/]+))?"
+ #"(/(?<product>[^/]+))?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups. So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)?.
Try
var pattern = #"^" + #"/EAN"
+ #"(?:/" + #"(?<producer>[^/]+))?"
+ #"(?:/" + #"(?<category>[^/]+))?"
+ #"(?:/" + #"(?<subcategory>[^/]+))?"
+ #"(?:/" + #"(?<product>[^/]+))?";
Note how I replaced the . with [^/], because you want to use the / to split strings. Note even the use of the optional quantifier for each sub-part (?)

Regular expressions: How to remove all "R.G(*******)" from a string

There are several strings, and I wanna to remove all "R.G(**)" from these strings. For example:
1、Original string:
Push("Command", string.Format(R.G("#{0} this is a string"), accID));
Result:
Push("Command", string.Format("#{0} this is a string", accID));
2、Original string:
Select(Case(T["AccDirect"]).WhenThen(1, R.G("input")).Else(R.G("output")).As("Direct"));
Result:
Select(Case(T["AccDirect"]).WhenThen(1, "input").Else("output").As("Direct"));
3、Original string:
R.G("this is a \"string\"")
Result:
"this is a \"string\""
4、Original string:
R.G("this is a (string)")
Result:
"this is a (string)"
5、Original string:
AppendLine(string.Format(R.G("[{0}] Error:"), str) + R.G("Contains one of these symbols: \\ / : ; * ? \" \' < > | & +"));
Result:
AppendLine(string.Format("[{0}] Error:", str) + "Contains one of these symbols: \\ / : ; * ? \" \' < > | & +");
6 、Original string:
R.G(#"this is the ""1st"" string.
this is the (2nd) string.")
Result:
#"this is the ""1st"" string.
this is the (2nd) string."
Please Help.
Use this, capture group 0 is your target, group 1 is your replace.
Fiddle
R[.]G[(]"(.*?[^\\])"[)]
Example acting on your #2 and #4 string and a new edge case R.G("this is a (\"string\")")
var pattern = #"R[.]G[(]\""(.*?[^\\])\""[)]";
var str = "Select(Case(T[\"AccDirect\"]).WhenThen(1, R.G(\"input\")).Else(R.G(\"output\")).As(\"Direct\"));";
var str2 = "R.G(\"this is a (string)\")";
var str3 = "R.G(\"this is a (\\\"string\\\")\")";
var res = Regex.Replace(str,pattern, "\"$1\"");
var res2 = Regex.Replace(str2,pattern, "\"$1\"");
var res3 = Regex.Replace(str3,pattern, "\"$1\"");
Try this:
var result = Regex.Replace(input, #"(.*)R\.G\(([^)]*)\)(.*)", "$1$2$3");
explanation:
(.*) # capture any characters
R.G\( # then match 'R.G.'
([^)]*) # then capture anything that isn't ')'
\) # match end parenthesis
(.*) # and capture any characters after
The $1$2$3 replaces your entire match with capture group 1, 2, and 3. Which effectively removes everything that isn't part of those matches, namely the "R.G(*)" part.
Note that you will run into problems if your strings contain 'R.G' or a right parenthesis somewhere, but depending on your input data, maybe this will do the trick well enough.

c# regex parse file in ical format and populate object with results

I'm trying to parse a file that has the following format:
BEGIN:VEVENT
CREATED:20120504T163940Z
DTEND;TZID=America/Chicago:20120504T130000
DTSTAMP:20120504T164000Z
DTSTART;TZID=America/Chicago:20120504T120000
LAST-MODIFIED:20120504T163940Z
SEQUENCE:0
SUMMARY:Test 1
TRANSP:OPAQUE
UID:21F61281-FB76-467F-A2CC-A666688BD9B5
X-RADICALE-NAME:21F61281-FB76-467F-A2CC-A666688BD9B5.ics
END:VEVENT
I need to take the values found after the colon or semi colon on each line and put them into props in an object. I'm attempting to do this with Regex, but I basically forget everything I know about Regex after I use it (which is maybe twice a year). Any help would be appreciated.
edit
This post got me thinking about the iCal format.
Before yesterday, I didn't know what the iCal format was. But, after reading the 1998 spec, its painfully obvious than none of the answers on this page is adequate to parse the content. And, its really too sophisticated even for my general regex below.
With that in mind, here is a solution that parses just the line content, as gleaned from the spec for general line content parsing. Its a step in the right direction, and hopefully someone can benefit. It doesen't do line continuation and does not validate.
C# code
Regex iCalMainRx = new Regex(
#" ^ (?<name> [^[:cntrl:]"";:,\n]+ )
(?<parameter>
;
(?<param_name> [^[:cntrl:]"";:,\n]+ )
=
(?<param_value>
(?: (?:[^\S\n]|[^[:cntrl:]"";:,])* | "" (?:[^\S\n]|[^[:cntrl:]""])* "" )
(?: , (?: (?:[^\S\n]|[^[:cntrl:]"";:,])* | "" (?:[^\S\n]|[^[:cntrl:]""])* "" ) )*
)
)*
:
(?<value> (?:[^\S\n]|[^[:cntrl:]])* )
$ ", RegexOptions.IgnorePatternWhitespace);
Regex iCalPvalRx = new Regex(
#" ^ (?<pvals> (?:[^\S\n]|[^[:cntrl:]"";:,])* | "" (?:[^\S\n]|[^[:cntrl:]""])* "" )
(?: ,+ (?<pvals> (?:[^\S\n]|[^[:cntrl:]"";:,])* | "" (?:[^\S\n]|[^[:cntrl:]""])* "" ) )*
$ ", RegexOptions.IgnorePatternWhitespace);
string[] lines = {
"BEGIN:VEVENT",
"CREATED:20120504T163940Z",
"DTEND;TZID=America/Chicago:20120504T130000",
"DTSTAMP:20120504T164000Z",
"DTSTART;TZID=,,,America/Chicago;Next=;last=\"this:;;;:=\";final=:20120504T120000",
"LAST-MODIFIED:20120504T163940Z",
"SEQUENCE:0",
"SUMMARY:Test 1",
"TRANSP:OPAQUE",
"UID:21F61281-FB76-467F-A2CC-A666688BD9B5",
"X-RADICALE-NAME:21F61281-FB76-467F-A2CC-A666688BD9B5.ics",
"END:VEVENT",
};
foreach (string str in lines)
{
Match m_content = iCalMainRx.Match( str );
if (m_content.Success)
{
Console.WriteLine("Key = " + m_content.Groups["name"].Value);
Console.WriteLine("Value = " + m_content.Groups["value"].Value);
CaptureCollection cc_pname = m_content.Groups["param_name"].Captures;
CaptureCollection cc_pvalue = m_content.Groups["param_value"].Captures;
if (cc_pname.Count > 0)
{
Console.WriteLine("Parameters: ");
for (int i = 0; i < cc_pname.Count; i++)
{
// Console.WriteLine("\t'" + cc_pname[i].Value + "' = '" + cc_pvalue[i].Value + "'");
Console.WriteLine("\t'" + cc_pname[i].Value + "' =");
Match m_vals = iCalPvalRx.Match( cc_pvalue[i].Value );
if (m_vals.Success)
{
CaptureCollection cc_vals = m_vals.Groups["pvals"].Captures;
for (int j = 0; j < cc_vals.Count; j++)
{
Console.WriteLine("\t\t'" + cc_vals[j].Value + "'");
}
}
}
}
Console.WriteLine("-------------------------");
}
}
Output
Key = BEGIN
Value = VEVENT
-------------------------
Key = CREATED
Value = 20120504T163940Z
-------------------------
Key = DTEND
Value = 20120504T130000
Parameters:
'TZID' =
'America/Chicago'
-------------------------
Key = DTSTAMP
Value = 20120504T164000Z
-------------------------
Key = DTSTART
Value = 20120504T120000
Parameters:
'TZID' =
''
'America/Chicago'
'Next' =
''
'last' =
'"this:;;;:="'
'final' =
''
-------------------------
Key = LAST-MODIFIED
Value = 20120504T163940Z
-------------------------
Key = SEQUENCE
Value = 0
-------------------------
Key = SUMMARY
Value = Test 1
-------------------------
Key = TRANSP
Value = OPAQUE
-------------------------
Key = UID
Value = 21F61281-FB76-467F-A2CC-A666688BD9B5
-------------------------
Key = X-RADICALE-NAME
Value = 21F61281-FB76-467F-A2CC-A666688BD9B5.ics
-------------------------
Key = END
Value = VEVENT
-------------------------
Spiting into lines and use IndexOf(":") may be enough for simple ICAL files instead of RegEx.
Check out if there is already existing ICAL parser and related questions ical+C#.
Try:
(?<key>[^:;]*)[:;](?<value>[^\s]*)
C# snippet:
Regex regex = new Regex(
#"(?<key>[^:;]*)[:;](?<value>[^\s]*)",
RegexOptions.None
);
It takes a string of any character but a colon or semicolon as the key, and then anything else but whitespace as the value.
If you want to test it or make changes, check out the regex checker I have on my blog: http://blog.stevekonves.com/2012/01/an-even-better-regex-tester/ (requires silverlight)
Run this with a few examples and see if it does what you want. I get the other comments about splitting or IndexOf but if you're expecting that the delimiter is either a colon or a semicolon then a regex is probably better.
string line = "LAST-MODIFIED:20120504T163940Z";
var p = Regex.Match(line, "(.*)?(:|;)(.*)$", RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Singleline);
Console.WriteLine(p.Groups[0].Value);
Console.WriteLine(p.Groups[1].Value);
Console.WriteLine(p.Groups[2].Value);
Console.WriteLine(p.Groups[3].Value);
I'd personally use string.Split(':') for this for each line in the file. This has the benefit of being easy to read and understand too if you don't want to re-learn regular expressions again!

Categories

Resources