Regex match account number in PDF until new line [duplicate]

Regex match account number in PDF until new line [duplicate] - c#

This question already has answers here:
Match exact string
(3 answers)
Closed 6 years ago.
I'm working on a pdf scraper in C# and I got stuck on a regex problem. I want to match just the account number and my regex statement is matching both the incorrect line and the correct line. I think I have to match everything until a new line but I can't find a way to do it.
This is my regex: ([A-Z0-9\-]{5,30})-[0-9]{1,10}-[0-9]{3}
XXX-XX-914026-1558513 // I don't want to match this line
130600298-110-528 // I want to match this line
Thanks in advance!

You have to add anchors:
^([A-Z0-9\-]{5,30})-[0-9]{1,10}-[0-9]{3}$
^ ^
Which mean start of line (^) and end of line ($).
If you don't, the match will be:
XXX-XX-914026-1558513
^^^^^^^^^^^^^^^^^
Also, you don't have to escape the caret in the end of a character class and you can use \d instead of [0-9]note: this will match numbers in any charset which gives:
^([A-Z0-9-]{5,30})-\d{1,10}-\d{3}$

Related

C# Regular Expression ends in _DXX where X is number [duplicate]

This question already has answers here:
How do I match an entire string with a regex?
(8 answers)
Closed 4 years ago.
I am trying to create a regular expression pattern in C# which allow you to have
next pattern: _DXX at the end of your .
Example :
04R5714A_D15 is correct
04R5714A_D05 is incorrect
04R5714A_D5 is correct
I tried : .*_D([1-9]{1}[0-9]?) but it didn't work :

.*_D[1-9]\d?$ should work for you.
Demo
.* catches everything up until your underscore
_D is a literal match
[1-9] matches one number in that range
\d? matches 0 or 1 single number (0-9)
$ is the end of the string

Regular expression for characters after '.' [duplicate]

This question already has answers here:
How do I match an entire string with a regex?
(8 answers)
Closed 6 years ago.
I need to detect following format when I enter serial number like
CK123456.789
I used Regex with pattern of
^(CV[0-9]{6}\.[0-9]{3}
to match but if I enter
CK123456.7890
it still able to proceed without flagging error. Is there a better regular expression to detect the trailing 3 digits after '.'?

Depending on how you use the regular expression matcher, you might need to enclose it in ^...$ which forces the pattern to be the whole string, i.e.
^CK[0-9]{6}\.[0-9]{3}$ (Note the CK prefix).
I've also removed your leading (mismatched) parenthesis.

What does regex expression match pattern "\\[.*\\]" mean? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I am new to regex. What does regex expression match pattern "\[.*\]" mean?
If I have a text like "Hello [Here]", then success is returned in the match. And match contain [Here].
I read that:
. indicates Any except \n (newline),
* indicates 0 or more times
I don't understand the "\". It believe it is just escape sequence for "\".
So, is the expression "\[.*\]" trying to match a pattern like \[Any text\]?

Yes, you are right. It will match any characters enclosed in []. The .* imply any or no characters enclosed in [].
Also you should try this link which is a very helpful regex tool. You can input the regex pattern and check for matches easily.

I have tried this on regexr, here is a screen shot:

Filter out alphabetic with regex using C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Only letters?
I try to filter out alphabetics ([a-z],[A-Z]) from text.
I tried "^\w$" but it filters alphanumeric (alpha and numbers).
What is the pattern to filter out alphabetic?
Thanks.

To remove all letters try this:
void Main()
{
var str = "some junk456456%^&%*333";
Console.WriteLine(Regex.Replace(str, "[a-zA-Z]", ""));
}

For filtering out only English alphabets use:
[^a-zA-Z]+
For filtering out alphabets regardless of the language use:
[^\p{L}]+
If you want to reverse the effect remove the hat ^ right after the opening brackets.
If you want to find whole lines that match the pattern then enclose the above patterns within ^ and $ signs, otherwise you don't need them. Note that to make them effect for every line you'll need to create the Regex object with the multi-line option enabled.

try this simple way:
var result = Regex.Replace(inputString, "[^a-zA-Z\s]", "");
explain:
+
Matches the previous element one or more times.
[^character_group]
Negation: Matches any single character that is not in character_group.
\s
Matches any white-space character.

To filter multiple alpha characters use
^[a-zA-Z]+$

How do I write a regex to match a string that doesn't contain a word? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to match string not containing a word?
To not match a set of characters I would use e.g. [^\"\\r\\n]*
Now I want to not match a fixed character set, e.g. "|="
In other words, I want to match: ( not ", not \r, not \n, and not |= ).
EDIT: I am trying to modify the regex for parsing data separated with delimiters. The single-delimiter solution I got form a CSV parser, but now I want to expand it to include multi-character delimiters. I do not think lookaheads will work, because I want to consume, not just assert and discard, the matching characters.

I figured it out, it should be: ((?![\"\\r\\n]|[|][=]).)*
The full regex, modified from the CSV parser link in the original post, will be: ((?<field>((?![\"\\r\\n]|[|][=]).)*)|\"(?<field>([^\"]|\"\")*)\")([|][=]|(?<rowbreak>\\r\\n|\\n|$))
This will match any amount of characters of ( not ", not \r, not \n, and not |= ), or a quoted string, followed by ( "|=" or end of line )

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex match account number in PDF until new line [duplicate] - c#

Related

C# Regular Expression ends in _DXX where X is number [duplicate]

Regular expression for characters after '.' [duplicate]

What does regex expression match pattern "\\[.*\\]" mean? [duplicate]

Filter out alphabetic with regex using C# [duplicate]

How do I write a regex to match a string that doesn't contain a word? [duplicate]

Categories

Resources