Regex to find invalid characters in integer with whitespaces - c#

I want to use regular expression (regex) to find invalid characters in a string. The string is a user input and when the regex finds invalid characters I want to give the user feedback which characters where invalid. Example warning message: "Only 0-9 and whitespace allowed. Found invalid characters: ab" when input was "- 10 a 0 b".
A valid string is:
integer
negative or positive
is allowed to have any amount of whitespace at any position.
So for example those VALID strings should NOT match the regex:
"-100"
"- 1 00"
" - 1 00"
"100"
" 1 0 0 "
"1 00"
While the regex should find matches in these INVALID strings:
"- 1 a 0 0 b" should match "a" and "b"
"- 1 a 0 0 -" should match "a" and "-"
I had a working regex for positive integers, until i found out that I forgot to include negative integers:
var regex = new Regex(#"[^0-9\s]")
var invalidCharacters = regex.Matches(text)
I have only very basic knowledge of regex. I tried out negating the regex to include negative integers, but it is not working:
new Regex(#"(?!-?[0-9\s])")
I hope someone can help me with this. If this can be solved easier by removing the whitespace requirement. Then please feel free to ignore the whitespace part.

I would approach this by thinking about the positive case first - which strings are valid? And then negate that with a negative lookaround.
I think this meets your requirements:
(?!\s*-?[\d\s]).
\s* will match any whitespace at the start
-? will optionally match a hyphen
[\d\s] will match numbers and whitespace
(?!expression) is a negative lookaround to negate the whole expression
The . at the end is a way to generate matches. The negative lookaround is just an assertion - it doesn't return any results.
It produces the desired results for the test cases in your question.

You may use
var invalidCharacters = Regex.Matches(text, #"[^0-9\s-]|(?<!^\s*)-");
See the regex demo (modified a bit as the demo is a test against a single multiline string.)
The regex matches:
[^0-9\s-] - a char other than an ASCII digit, any Unicode whitespace char or a - char
| - or
(?<!^\s*)- - a - char that is not preceded with the start of string any any 0+ whitespace chars.

Related

Regex match multiple digits after '-'

This seems like it should be easy, but I'm not so good with regex, and this doesn't seem to be easy to find on google.
I need a regex that starts with the string 'SP-multiple digits' and ends with the string '- multiple digits'
For example i have to match '-12' in "Sp-1234-12".
My attempt was: [^*-]*$ -> This case matches everything after the minus but i need the minus included.
For that digit and hyphen format, you could use a capture group for the part of the string that you want:
^Sp(?:-\d+)*(-\d+)$
Explanation
^ Start of string
Sp Match literally
(?:-\d+)* Optionally repeat - and 1+ digits
(-\d+) Capture group 1, match - and 1+ digits
$ End of string
Regex demo
Note that in C# you can use [0-9] instead of \d to match only digits 0-9

Replace [0-9] with 0[0-9]

I would like to know if there is a way to add a 0 in front of every 1 digit number using regex.
The input string is a list of 1-3 numbers separated by a '-' (i.e. 28, 12-56, 67-6, 45-65-46).
The two issues I have come across are:
Matching all the possible 1 digit numbers without doing a seperate check for each of the below: ^[0-9]-, -[0-9]-, ^[0-9]&, -[0-9]&
Adding the 0 without removing anything else. So turn: Regex.Replace(input, "^[0-9]-","0") into something like: Regex.Replace(input, "^[0-9]-","^0[0-9]-")
search: (?<!\d)(\d)(?!\d)
replace 0$1
Where:
(?<!\d) is a negative lookbehind that makes sure we haven't a digit before the match.
(\d) is a capture group, that is referenced by $1 in the replacement part.
(?!\d) is a negative lookahead that makes sure we haven't a digit after the match.
See lookaround for info.
This replaces digit not preceeded or followed by other digit by 0 + the digit.
Regex.Replace(input, #"(?<!\d)(\d)(?!\d)","0$1")

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

Regular Expression: Section names with unknown length?

I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.
UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.
string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();
well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.

validate string based on a format

I have a string that must be on the following format:
XXXX-XX-XXX-XXXX-XXXXXXXXXX-X
where X is an integer. The number of integers don't matter. I just need to make sure that the string:
starts and ends with an integer
contains only integers separated by dashes
what would be the easiest way to validate that?
This regexp should do the trick. It uses a negative lookbehind to avoid matching multiple dashes in a row.
^\d(\d|(?<!-)-)*\d$|^\d$
^ ^ ^ ^
| | | -- is a single digit, or
| | ------- ends with a digit
| ----------------consists on digits or dashes not preceded by dashes
---------------------starts with a digit
Here is a C# code that illustrates its use (also on ideone):
var r = new Regex("^\\d(\\d|(?<!-)-)*\\d$|^\\d$");
Console.WriteLine(r.IsMatch("1-2-3"));
Console.WriteLine(r.IsMatch("1-222-3333"));
Console.WriteLine(r.IsMatch("123"));
Console.WriteLine(r.IsMatch("1-2-3-"));
Console.WriteLine(r.IsMatch("1"));
Console.WriteLine(r.IsMatch("-11-2-3-"));
Use a regular expression.
^\d[-0-9]+\d$
This assumes the string is at least three characters long.
Breakdown:
^ - match start of string
\d - match a digit
[ - start of character class containing:
- - a dash
0-9 - 0 to 9
] - end of character class
+ - match one or more of the previous
\d - match a digit
$ - match end of string
You can change the + to * to make 2 digit strings valid, and add an alternation to make 1 digit strings valid as well:
^(\d|\d[-0-9]*\d)$
Note: In .NET, \d will match any Unicode digit (so, for example, Arabic digits will match) - if you don't want that, replace \d with [0-9] in every place.
you can write a regular expression that does the trick.
Than you can use that regular expression to validate your string
^ ---->Start of a string.
$ ---->End of a string.
. ----> Any character (except \n newline)
{...}----> Explicit quantifier notation.
[...] ---->Explicit set of characters to match.
(...) ---->Logical grouping of part of an expression.
* ---->0 or more of previous expression.
+ ---->1 or more of previous expression.
? ---->0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\ ---->Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
\w ----> matches any word character, equivalent to [a-zA-Z0-9]
\W ----> matches any non word character, equivalent to [^a-zA-Z0-9].
\s ----> matches any white space character, equivalent to [\f\n\r\v]
\S----> matches any non-white space characters, equivalent to [^\f\n\r\v]
\d ----> matches any decimal digits, equivalent to [0-9]
\D----> matches any non-digit characters, equivalent to [^0-9]
\a ----> Matches a bell (alarm) \u0007.
\b ----> Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.
\t ---->Matches a tab \u0009.
\r ---->Matches a carriage return \u000D.
\v ---->Matches a vertical tab \u000B.
\f ---->Matches a form feed \u000C.
\n ---->Matches a new line \u000A.
\e ---->Matches an escape \u001B
$number ----> Substitutes the last substring matched by group number number (decimal).
${name} ----> Substitutes the last substring matched by a (? ) group.
$$ ----> Substitutes a single "$" literal.
$& ----> Substitutes a copy of the entire match itself.
$` ----> Substitutes all the text of the input string before the match.
$' ----> Substitutes all the text of the input string after the match.
$+ ----> Substitutes the last group captured.
$_ ----> Substitutes the entire input string.
(?(expression)yes|no) ----> Matches yes part if expression matches and no part will be ommited.
more info at
http://geekswithblogs.net/brcraju/archive/2003/10/23/235.aspx
Regular expression is probably the way to go this might help:
http://www.regular-expressions.info/creditcard.html

Categories

Resources