Percentage position move - c#

Is there a simple way to move percentage pointer after the value:
120 # %60 {a} >> 120 # 60% {a}

Try this:
string input = "120 # %60 {a}";
string pattern = #"%(\d+)";
string result = Regex.Replace(input, pattern, "$1%");
Console.WriteLine(result);
The %(\d+) pattern matches a % symbol followed by at least one digit. The digits are captured in a group which is referenced via the $1 in the replacement pattern $1%, which ends up placing the % symbol after the captured number.
If you need to account for numbers with decimal places, such as %60.50, you can use this pattern instead: #"%(\d+(?:\.\d+)?)"

Related

Getting matching result using Regex

I need to find the matching result i.e a string using Regex. Let me demonstrate the scenario using sample inputs.
string input= "xb-cv_107_20190608_032214_006"; // <-1st case
string input = "yb-ha_107_20190608_032214_006__foobar"; // <-2nd case
string input= "fv_vgf_ka01mq3286__20190426_084135_039"; // <-3rd case
string input="fv_vgf_ka01mq3286__2090426_084135_039"; //<-4th case
For 1st case input, output required= "xb-cv_107_20190608_032214_006".
For 2nd case input, output required= "yb-ha_107_20190608_032214_006".
For 3rd case input, output required= "fv_vgf_ka01mq3286__20190426_084135_039".
For 4th case input, output required= null since the pattern does not match.
The procedure to get the output is:
Check using regex if pattern ends with _ followed by 8 decimals followed by '_'
followed by 6 decimals followed by 3 decimals
Or check using regex if pattern ends with _ followed by 8 decimals followed by _ followed by 6 decimals followed by 3 decimals exists followed by __ exists followed by anything random.
Till now, I have come up with this Regex expression:
string pattern = #".+[_][0-9]{8}[_][0-9]{6}[_][0-9]{3}([_]{2})?";
var result = Regex.Match(input, pattern)?.Groups[0].Value ;
You may use
var result = Regex.Match(input, #"^(.+_[0-9]{8}_[0-9]{6}_[0-9]{3})__")?.Groups[1].Value;
Regex details:
^ - start of string
( - Group 1 start:
.+ - any 1+ chars other than LF, as many as possible
_[0-9]{8}_[0-9]{6}_[0-9]{3} - _, 8 digits, _, 6 digits, _, 3 digits
) - end of Group 1
__ - two underscores.
If there is a match, the result holds the value that resides in Group 1.
If there is no match, result is null.

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom
You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.

How to match string format in Regex c#

I am passing a correct string formate but its not return true.
string dimensionsString= "13.5 inches high x 11.42 inches wide x 16.26 inches deep";
// or 10.1 x 12.5 x 30.9 inches
// or 10.1 x 12.5 x 30.9 inches ; 3.2 pounds
Regex rgxFormat = new Regex(#"^([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+) x ([0-9\.]+) ([a-z]+)( ; ([0-9\.]+) ([a-z]+))?$");
if (rgxFormat.IsMatch(dimensionsString))
{
//match
}
I can't understand whats wrong with code ?
Your pattern only accounts for single words after the numbers. Allow any number of symbols there (with .* or .*?) to fix the pattern:
^([0-9.]+) (.*?) x ([0-9\.]+) (.*?) x ([0-9.]+) (.*?)( ; ([0-9.]+) (.*))?$
See the regex demo.
Note that the last .* is used with a greedy quantifier since it is the last unknown bit in the string (to match all the rest of the string). The .*? are non-greedy versions that match as few occurrences of any char but a newline as possible.
Replace regular spaces with \s to match any kind of whitespace if necessary.

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

C# Regex Grouping

I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.

Categories

Resources