A part of my application requires to find out the occurrence of "1p/sec" or "22p/sec" or "22p/ sec" or ( [00-99]p/sec also [00-99]p/ sec) in a string.
so far I am able to get only the first occurrence(i.e if its a single digit, like the one in the above string). I should be able to get 'n' number of occurrence
Someone pl provide guidance
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"(\d)[p/sec]",
RegexOptions.IgnorePatternWhitespace);
// input.IndexOf("p/sec");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
You need to quantify your \d in the regex, for example by adding a + quantifier, which will then cause \d+ to match at least one but possibly more digits. To restrict to a specific number of digits, you can use the {n,m} quantifier, e.g. \d{1,2} which will then match either one or two digits.
Note also that [p/sec] as you use it in the regex is a character class, matching a single character from the set { c, e, p, s, / }, which is probably not what you want because you'd want to match the p/sec literally.
A more robust option would probably be the following
(\d+)\s*p\s*/\s*sec
which a) matches p/sec literally and also allows for whitespace between the number and the unit as well as around the /.
Use
Match match = Regex.Match(input, #"(\d{1,2})p/sec" ...
instead.
\d mathces a single digit. If you append {1,2} to that you instead match one - two digits. \d* would match zero or more and \d+ would match one or more. \d{1,10} would match 1-10 digits.
If you need to know if it was surrounded by brackets or not you could do
Match match = Regex.Match(input, #"(([\d{1,2}])|(\d{1,2}))p/sec"
...
bool hasBrackets = match.Groups[1].Value[0] == '[';
How about this regex:
^\[?\d\d?\]?/\s*sec$
Explanation:
The regular expression:
^\[?\d\d?\]?/\s*sec$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\[? '[' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
\]? ']' (optional (matching the most amount
possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
sec 'sec'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
If i get you right, you want to find all occurrances, right? Use a matchcollection
string input = "US Canada calling # 1p/sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Regex regex = new Regex(#"(\d){1,2}[p/sec]", RegexOptions.IgnorePatternWhitespace);
MatchCollection matchCollection = regex.Matches(input);
// input.IndexOf("p/sec");
// Here we check the Match instance.
foreach (Match match in matchCollection)
{
Console.WriteLine(match.Groups[0].Value);
}
You can do this:
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). 91p/ sec , 123p/ sec Validity : 30 Days.";
MatchCollection matches = Regex.Matches(input, #"\b(\d{1,2})p/\s?sec");
foreach (Match m in matches) {
string key = m.Groups[1].Value;
Console.WriteLine(key);
}
Output:
1
11
91
\b is a word boundary, to anchor the match to the start of a "word" (notice that it will not match the "123p/ sec" in the test string!)
\d{1,2} Will match one or two digits. See Quantifiers
p/\s?sec matches a literal "p/sec" with an optional whitespace before "sec"
Regex Expression:
((?<price>(\d+))p/sec)|(\[(?<price>(\d+[-]\d+))\]p/sec)
In C# you need to run a for loop to check multiple captures:
if(match.Success)
{
for(int i = 0; i< match.Groups["price"].Captures.Count;i++)
{
string key = match.Groups["price"].Value;
Console.WriteLine(key);
}
}
Related
Working with a pipe-delimited file. Currently, I use Notepad++ find and replace REGEX pattern ^(?:[^|]*\|){5}\K[^|]* that replaces all lines with an empty string between the 5th and 6th |. I'm trying to programmatically do this process, but .NET does not support \K. I've tried a few instances of the backward lookup, but I cannot seem to grasp it.
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
string line2 = null;
string finalLine = line;
string[] col = line.Split('|');
if (col[5] != null)
{
line2 = Regex.Replace(line, #"^(?:[^|]*\|){5}\K[^|]*", "");
\K is a "workaround" for regex grammars/engines that don't support anchoring against look-behind assertions.
.NET's regex grammar has look-behind assertions (using the syntax (?<=subexpression)), so use them:
Regex.Replace(line, #"(?<=^(?:[^|]*\|){5})[^|]*", "")
In the context of .NET, this pattern now describes:
(?<= # begin (positive) look-behind assertion
^ # match start of string
(?: # begin non-capturing group
[^|]*\| # match (optional) field value + delimiter
){5} # end of group, repeat 5 times
) # end of look-behind assertion
[^|]* # match any non-delimiters (will only occur where the lookbehind is satisfied)
No need using lookbehinds, use capturing groups and backreferences:
line2 = Regex.Replace(line, #"^((?:[^|]*\|){5})[^|]*", "$1");
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (5 times):
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
){5} end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am parsing some manifest files and need to sanitize them before I can load them as XML. So, these files are invalid XML files.
Consider the following snippet:
<assemblyIdentity name=""Microsoft.Windows.Shell.DevicePairingFolder"" processorArchitecture=""amd64"" version=""5.1.0.0"" type="win32" />
There are several instances of double quotes, "", that I want to replace with single occurrences, ".
Essentially, the example would be transformed to
<assemblyIdentity name="Microsoft.Windows.Shell.DevicePairingFolder" processorArchitecture="amd64" version="5.1.0.0" type="win32" />
I presume a regex would be the best approach here, however, it is not my strong point.
The following should be noted:
The manifest is a multiline string (essentially just an XML document)
Something like processorArchitecture="" is valid in the document hence why a simple string.Replace call is not appropriate.
Two ways:
String replace
var newString = s.Replace("\"\"", "\"");
Regex.
string checkStringForDoubleQuotes = #"""";
string newString = Regex.Replace(s, checkStringForDoubleQuotes , #""");
After update:
Your regex is this https://regex101.com/r/xZUtUf/1/
""(?=\w)|(?<=\w)""
string s = "test=\"\" test2=\"\"assdasad\"\"";
string checkStringForDoubleQuotes = "\"\"(?=\\w)|(?<=\\w)\"\"";
string newString = Regex.Replace(s, checkStringForDoubleQuotes , "\"");
Console.WriteLine(newString);
// test="" test2="assdasad"
https://dotnetfiddle.net/FmWXUa
Use
(\w+=)""(.*?)""(?=\s+\w+=|$)
Replace with $1"$2". See proof.
Explanation
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
"" '""'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
"" '""'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
C# example:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(\w+=)""""(.*?)""""(?=\s+\w+=|$)";
string substitution = #"$1""$2""";
string input = #"<assemblyIdentity name=""""Microsoft.Windows.Shell.DevicePairingFolder"""" processorArchitecture=""""amd64"""" version=""""5.1.0.0"""" type=""win32"" />";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution);
Console.Write(result);
}
}
Use the hex escape for quotes as \x22 to make it easier to work with. This will replace each individual consecutive "" to ".
Regex.Replace(data, #"(\x22\x22)", "\x22")
I want a regular expression that's match anything as a parameter for this string concat(1st,2nd) and extract three matching groups as below :
Group1: concat
Group2: 1st
Group3: 2nd.
I have tried this :^\s*(concat)\(\s*(.*?)\s*\,\s*(.*)\)\s*$, and it worked fine until I had a parameter with comma as below:
concat(regex(3,4),regex(3,4)). It seams the comma is breaking it down, how to ignore the parameter content and take it as a seperate group?
You may use
^\s*(concat)\(\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\s*,\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\)\s*$
See the regex demo.
Details
^ - start of string
\s* - 0+ whitespaces
(concat) - Group 1: concat word
\( - a ( char
\s* - 0+ whitespaces
({arg}) - Group 2: arg pattern:
\w* - 0+ word chars
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\) - (, then any amount of nested parentheses or chars other than ( and ) and then )
|\w+ - or just 1+ word chars
\s*,\s* - a comma enclosed with 0+ whitespaces
({arg}) - Group 3: arg pattern
\) - a ) char
\s* - 0+ whitespaces
$ - end of string.
See C# demo:
var arg = #"(?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+)";
var pattern = $#"^\s*(concat)\(\s*({arg})\s*,\s*({arg})\)\s*$";
var match = Regex.Match("concat(regex(3,4),regex(3,4))", pattern);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
Console.WriteLine(match.Groups[3].Value);
}
// => concat regex(3,4) regex(3,4)
Results:
I have a dwg file with a filename of SMITH 3H FINAL 03-26-2012.dwg and I'm trying to find the right Regular Expression for validation purposes because I will have 100's of files weekly I need to verify the format of the filename is correct. I know very little about Regular Expressions and I have some code I found below but it is not passing as valid. If I'm reading the first line correctly, then is it expecting a comma in the filename and that's why it's not passing as valid?
string filenamePattern = String.Concat("^",
"([a-z',-.]+\\s+)+", // HARRIS, SMITH
"(\\d{1,2}-\\d{1,2}){1}\\s+", // 09-06
"([a-z]+\\s)*", //
"((\\#?\\s*(\\d(\\s*|,))*\\d*-\\d+-?H?D?\\d*?),*\\s+(&\\s)*)+", // #5,6-11H & #4,7,8-11H2, etc
"([a-z()-]+\\s)*", // CLIP-OUT (FINAL)
"(\\d{1,2}-\\d{1,2}(-\\d{2}|-\\d{4})){1}", // 05-11-2009
"\\.dwg", // .dwg
"$");
RegexOptions options = (RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.IgnoreCase);
Regex reg = new Regex(filenamePattern, options);
if (reg.IsMatch(filename))
{
valid = true;
}
According to your comments on other answer, have a try with:
^[a-z]+(?:[ -][a-z]+)*\s+\d+H\s+[a-z]+\s+\d{2}-\d{2}-\d{4}\.dwg$
explanation:
The regular expression:
(?-imsx:^[a-z]+(?:[ -][a-z]+)*\s+\d+H\s+[a-z]+\s+\d{2}-\d{2}-\d{4}\.dwg$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
[ -] any character of: ' ', '-'
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
H 'H'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
dwg 'dwg'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
This is how I would do it:
// This checks for name"(\w)", then space, then 3H (\w{2}),
// this will only search for two characters, then space
// then date in the form mm-dd-yyyy or dd-mm-yyyy (\d{2}-\d{2}-\d{4})
Regex reg = new Regex(#"(\w*)\s(\w{2})\s(\w*)\s(\d{2}-\d{2}-\d{4})\.dwg");
if(reg.IsMatch(filename))
{
valid = true;
}
You would also be able to get each group. Note, that I didn't have the regex to validate the proper class period (or what I assume is class period, "#5,6-11H & #4,7,8-11H2, etc" part). This will provide a basic framework and then you can pull that group and do the checking in the code. It provides a cleaner regex.
EDIT:
Based on what #DaBears needs, I have come up with the following:
Regex reg = new Regex(#"(\w*|\w*-\w*|\w*\s\w*)\s(\w{2})\s(\w*)\s(\d{2}-\d{2}-\d{4})\.dwg");
if(reg.IsMatch(filename))
{
valid = true;
}
This will match for a last name, a hyphenated name, or a space last name and provide whatever they have in a group.
I've got a problem. I use following regular expression:
Pattern =
(?'name'\w+(?:\w|\s)*), \s*
(?'category'\w+(?:\w|\s)*), \s*
(?:
\{ \s*
[yY]: (?'year'\d+), \s*
[vV]: (?'volume'(?:([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+), \s*
\} \s*
,? \s*
)*
with IgnorePatternWhitespaces option.
Everything seemed fine in my application until I debugged it & encountered a problem.
var Year = default(UInt32);
// ...
if((Match = Regex.Match(Line, Pattern, Options)).Success)
{
// Getting Product header information
Name = Match.Groups["name"].Value;
// Gathering Product statistics
for(var ix = default(Int32); ix < Match.Groups["year"].Captures.Count; ix++)
{
// never get here
Year = UInt32.Parse(Match.Groups["year"].Captures[ix].Value, NumberType, Culture);
}
}
So in the code above.. In my case Match is always successful. I get proper value for Name but when turn comes to for loop program flow just passes it by. I debugged there's no Captures in Match.Groups["year"]. So it is logical behavior. But not obvious to me where I'm wrong. Help!!
There is a previous connected post Extract number values enclosed inside curly brackets I made.
Thanks!
EDIT. Input Samples
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}
I need to capture 2008, 5528.35, 2009, 8653.89, 2010, 4290.51 values and operate with them as named groups.
2D EDIT
I tried using ExplicitCapture Option and following expression:
(?<name>\w+(w\| )*), (?<category>\w+(w\| )*), (\{[yY]:(?<year>\d+), *[vV]:(?<volume>(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+)\}(, )?)+
But that didn't help.
Edit: You could simplify by matching everything until the next comma: [^,]*. Here's a full code snippet to match your source data:
var testRegex = new Regex(#"
(?'name'[^,]*),\s*
(?'category'[^,]*),\s*
({y:(?'year'[^,]*),\s*
V:(?'volume'[^,]*),?\s*)*",
RegexOptions.IgnorePatternWhitespace);
var testMatches = testRegex.Matches(
"Sherwood, reciev, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}");
foreach (Match testMatch in testMatches)
{
Console.WriteLine("Name = {0}", testMatch.Groups["name"].Value);
foreach (var capture in testMatch.Groups["year"].Captures)
Console.WriteLine(" Year = {0}", capture);
}
This prints:
Name = Sherwood
Year = 2008
Year = 2009
Year = 2010
I think the problem is a comma:
, \s* \}
which should be optional (or omitted?):
,? \s* \}
To expound on what MRAB said:
(?'name'
\w+
(?:
\w|\s
)*
),
\s*
(?'category'
\w+
(?:
\w|\s
)*
),
\s*
(?:
\{
\s*
[yY]:
(?'year'
\d+
),
\s*
[vV]:
(?'volume'
(?:
( # Why do you need capturing parenth's here ?
[1-9][0-9]*
\.?
[0-9]*
)
|
(
\.[0-9]+
)
)+
), # I'm just guessing this comma doesent match input samples
\s*
\}
\s*
,?
\s*
)*
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}