How to match string format in Regex c# - c#

I am passing a correct string formate but its not return true.
string dimensionsString= "13.5 inches high x 11.42 inches wide x 16.26 inches deep";
// or 10.1 x 12.5 x 30.9 inches
// or 10.1 x 12.5 x 30.9 inches ; 3.2 pounds
Regex rgxFormat = new Regex(#"^([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+)( ; ([0-9\.]+) ([a-z]+))?$");
if (rgxFormat.IsMatch(dimensionsString))
{
//match
}
I can't understand whats wrong with code ?

Your pattern only accounts for single words after the numbers. Allow any number of symbols there (with .* or .*?) to fix the pattern:
^([0-9.]+) (.*?) x ([0-9\.]+) (.*?) x ([0-9.]+) (.*?)( ; ([0-9.]+) (.*))?$
See the regex demo.
Note that the last .* is used with a greedy quantifier since it is the last unknown bit in the string (to match all the rest of the string). The .*? are non-greedy versions that match as few occurrences of any char but a newline as possible.
Replace regular spaces with \s to match any kind of whitespace if necessary.

Related

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom
You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

Regex remove character folloing n characters

I am a bit out of exercise after two years not coding.
I have thousand of lines in a txt file. All similar to this one:
X0 Y0 S-0.30
X0 Y0 S-0.21
X0 Y0 S-0.08
I need to remove the S-x.xx value from all lines. So only the X Y and the relevant values will be saved for each line.
I have attempted with Regex
if (Line.Contains("S"))
{
Regex rgx = new Regex(#"S\d+(\.\d+)?");
Line = rgx.Replace(Line, "");
}
return Line;
But I am not getting the result I expect. Any hint will be appreciated.
Thanks
You can use the following RegEx : (X\d+ Y\d+).*
And use $1.
X matches the character X literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times
Y matches the characters Y literally (case sensitive)
\d+ matches a digit (equal to [0-9]) one or more times
You may use
var result = Regex.Replace(s, #"\s+S-\d+(?:\.\d+)?", string.Empty);
See the regex demo
\s+ - 1+ whitespaces
S- - S- substring
\d+(?:\.\d+)? - a number pattern:
\d+ - 1+ digits
(?:\.\d+)? - an optional sequence:
\. - dot
\d+ - 1+ digits.
Appreciate that you really just want to remove the final third term, which begins with S. Just try replacing the pattern \sS.*$ with empty string:
Regex rgx = new Regex(#"\sS.*$");
string Line = "X0 Y0 S-0.30";
Line = rgx.Replace(Line, "");
Console.WriteLine(Line);
X0 Y0
Demo

C# Regex Grouping

I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.

Percentage position move

Is there a simple way to move percentage pointer after the value:
120 # %60 {a} >> 120 # 60% {a}
Try this:
string input = "120 # %60 {a}";
string pattern = #"%(\d+)";
string result = Regex.Replace(input, pattern, "$1%");
Console.WriteLine(result);
The %(\d+) pattern matches a % symbol followed by at least one digit. The digits are captured in a group which is referenced via the $1 in the replacement pattern $1%, which ends up placing the % symbol after the captured number.
If you need to account for numbers with decimal places, such as %60.50, you can use this pattern instead: #"%(\d+(?:\.\d+)?)"

Categories

Resources