I am passing a correct string formate but its not return true.
string dimensionsString= "13.5 inches high x 11.42 inches wide x 16.26 inches deep";
// or 10.1 x 12.5 x 30.9 inches
// or 10.1 x 12.5 x 30.9 inches ; 3.2 pounds
Regex rgxFormat = new Regex(#"^([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+)( ; ([0-9\.]+) ([a-z]+))?$");
if (rgxFormat.IsMatch(dimensionsString))
{
//match
}
I can't understand whats wrong with code ?
Your pattern only accounts for single words after the numbers. Allow any number of symbols there (with .* or .*?) to fix the pattern:
^([0-9.]+) (.*?) x ([0-9\.]+) (.*?) x ([0-9.]+) (.*?)( ; ([0-9.]+) (.*))?$
See the regex demo.
Note that the last .* is used with a greedy quantifier since it is the last unknown bit in the string (to match all the rest of the string). The .*? are non-greedy versions that match as few occurrences of any char but a newline as possible.
Replace regular spaces with \s to match any kind of whitespace if necessary.
Related
I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom
You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.
I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)
I am a bit out of exercise after two years not coding.
I have thousand of lines in a txt file. All similar to this one:
X0 Y0 S-0.30
X0 Y0 S-0.21
X0 Y0 S-0.08
I need to remove the S-x.xx value from all lines. So only the X Y and the relevant values will be saved for each line.
I have attempted with Regex
if (Line.Contains("S"))
{
Regex rgx = new Regex(#"S\d+(\.\d+)?");
Line = rgx.Replace(Line, "");
}
return Line;
But I am not getting the result I expect. Any hint will be appreciated.
Thanks
You can use the following RegEx : (X\d+ Y\d+).*
And use $1.
X matches the character X literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times
Y matches the characters Y literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times
You may use
var result = Regex.Replace(s, #"\s+S-\d+(?:\.\d+)?", string.Empty);
See the regex demo
\s+ - 1+ whitespaces
S- - S- substring
\d+(?:\.\d+)? - a number pattern:
\d+ - 1+ digits
(?:\.\d+)? - an optional sequence:
\. - dot
\d+ - 1+ digits.
Appreciate that you really just want to remove the final third term, which begins with S. Just try replacing the pattern \sS.*$ with empty string:
Regex rgx = new Regex(#"\sS.*$");
string Line = "X0 Y0 S-0.30";
Line = rgx.Replace(Line, "");
Console.WriteLine(Line);
X0 Y0
Demo
I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.
Is there a simple way to move percentage pointer after the value:
120 # %60 {a} >> 120 # 60% {a}
Try this:
string input = "120 # %60 {a}";
string pattern = #"%(\d+)";
string result = Regex.Replace(input, pattern, "$1%");
Console.WriteLine(result);
The %(\d+) pattern matches a % symbol followed by at least one digit. The digits are captured in a group which is referenced via the $1 in the replacement pattern $1%, which ends up placing the % symbol after the captured number.
If you need to account for numbers with decimal places, such as %60.50, you can use this pattern instead: #"%(\d+(?:\.\d+)?)"