validate string based on a format - c#

I have a string that must be on the following format:
XXXX-XX-XXX-XXXX-XXXXXXXXXX-X
where X is an integer. The number of integers don't matter. I just need to make sure that the string:
starts and ends with an integer
contains only integers separated by dashes
what would be the easiest way to validate that?

This regexp should do the trick. It uses a negative lookbehind to avoid matching multiple dashes in a row.
^\d(\d|(?<!-)-)*\d$|^\d$
^ ^ ^ ^
| | | -- is a single digit, or
| | ------- ends with a digit
| ----------------consists on digits or dashes not preceded by dashes
---------------------starts with a digit
Here is a C# code that illustrates its use (also on ideone):
var r = new Regex("^\\d(\\d|(?<!-)-)*\\d$|^\\d$");
Console.WriteLine(r.IsMatch("1-2-3"));
Console.WriteLine(r.IsMatch("1-222-3333"));
Console.WriteLine(r.IsMatch("123"));
Console.WriteLine(r.IsMatch("1-2-3-"));
Console.WriteLine(r.IsMatch("1"));
Console.WriteLine(r.IsMatch("-11-2-3-"));

Use a regular expression.
^\d[-0-9]+\d$
This assumes the string is at least three characters long.
Breakdown:
^ - match start of string
\d - match a digit
[ - start of character class containing:
- - a dash
0-9 - 0 to 9
] - end of character class
+ - match one or more of the previous
\d - match a digit
$ - match end of string
You can change the + to * to make 2 digit strings valid, and add an alternation to make 1 digit strings valid as well:
^(\d|\d[-0-9]*\d)$
Note: In .NET, \d will match any Unicode digit (so, for example, Arabic digits will match) - if you don't want that, replace \d with [0-9] in every place.

you can write a regular expression that does the trick.
Than you can use that regular expression to validate your string
^ ---->Start of a string.
$ ---->End of a string.
. ----> Any character (except \n newline)
{...}----> Explicit quantifier notation.
[...] ---->Explicit set of characters to match.
(...) ---->Logical grouping of part of an expression.
* ---->0 or more of previous expression.
+ ---->1 or more of previous expression.
? ---->0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\ ---->Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
\w ----> matches any word character, equivalent to [a-zA-Z0-9]
\W ----> matches any non word character, equivalent to [^a-zA-Z0-9].
\s ----> matches any white space character, equivalent to [\f\n\r\v]
\S----> matches any non-white space characters, equivalent to [^\f\n\r\v]
\d ----> matches any decimal digits, equivalent to [0-9]
\D----> matches any non-digit characters, equivalent to [^0-9]
\a ----> Matches a bell (alarm) \u0007.
\b ----> Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.
\t ---->Matches a tab \u0009.
\r ---->Matches a carriage return \u000D.
\v ---->Matches a vertical tab \u000B.
\f ---->Matches a form feed \u000C.
\n ---->Matches a new line \u000A.
\e ---->Matches an escape \u001B
$number ----> Substitutes the last substring matched by group number number (decimal).
${name} ----> Substitutes the last substring matched by a (? ) group.
$$ ----> Substitutes a single "$" literal.
$& ----> Substitutes a copy of the entire match itself.
$` ----> Substitutes all the text of the input string before the match.
$' ----> Substitutes all the text of the input string after the match.
$+ ----> Substitutes the last group captured.
$_ ----> Substitutes the entire input string.
(?(expression)yes|no) ----> Matches yes part if expression matches and no part will be ommited.
more info at
http://geekswithblogs.net/brcraju/archive/2003/10/23/235.aspx

Regular expression is probably the way to go this might help:
http://www.regular-expressions.info/creditcard.html

Related

Regex to find invalid characters in integer with whitespaces

I want to use regular expression (regex) to find invalid characters in a string. The string is a user input and when the regex finds invalid characters I want to give the user feedback which characters where invalid. Example warning message: "Only 0-9 and whitespace allowed. Found invalid characters: ab" when input was "- 10 a 0 b".
A valid string is:
integer
negative or positive
is allowed to have any amount of whitespace at any position.
So for example those VALID strings should NOT match the regex:
"-100"
"- 1 00"
" - 1 00"
"100"
" 1 0 0 "
"1 00"
While the regex should find matches in these INVALID strings:
"- 1 a 0 0 b" should match "a" and "b"
"- 1 a 0 0 -" should match "a" and "-"
I had a working regex for positive integers, until i found out that I forgot to include negative integers:
var regex = new Regex(#"[^0-9\s]")
var invalidCharacters = regex.Matches(text)
I have only very basic knowledge of regex. I tried out negating the regex to include negative integers, but it is not working:
new Regex(#"(?!-?[0-9\s])")
I hope someone can help me with this. If this can be solved easier by removing the whitespace requirement. Then please feel free to ignore the whitespace part.
I would approach this by thinking about the positive case first - which strings are valid? And then negate that with a negative lookaround.
I think this meets your requirements:
(?!\s*-?[\d\s]).
\s* will match any whitespace at the start
-? will optionally match a hyphen
[\d\s] will match numbers and whitespace
(?!expression) is a negative lookaround to negate the whole expression
The . at the end is a way to generate matches. The negative lookaround is just an assertion - it doesn't return any results.
It produces the desired results for the test cases in your question.
You may use
var invalidCharacters = Regex.Matches(text, #"[^0-9\s-]|(?<!^\s*)-");
See the regex demo (modified a bit as the demo is a test against a single multiline string.)
The regex matches:
[^0-9\s-] - a char other than an ASCII digit, any Unicode whitespace char or a - char
| - or
(?<!^\s*)- - a - char that is not preceded with the start of string any any 0+ whitespace chars.

How to create a repeating non-capturing group?

I'm trying to create what I think is a repeating non-capturing group, and I just can't figure out how.
In plain words, I want to match:
Any number which is both
Preceded by any amount of blocks that doesn't contain a space, but is not either just a number.
Followed by any amount of blocks that doesn't contain a space, but is not either just a number.
Here is what I tried:
Pattern: (?:\w.)+(\d+)(?:.\w+)+
Test Set:
3.AAA
AAA.BBB
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.4
AAA.3.BBB.4.CCC
AAA.3.BBB.CCC
AAA.3.BBB.CCC.4
AAA.3.BBB.CCC.4.DDD
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.4
ZZZ.AAA.3.BBB.4.CCC
ZZZ.AAA.3.BBB.CCC
ZZZ.AAA.3.BBB.CCC.4
ZZZ.AAA.3.BBB.CCC.4.DDD
I would want it to match only to:
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC
Note: I saw some other posts asking the same-ish question, but I can't use the answers because they were all like "Instead of trying to repeat a group, just match 'this' and it will work for your specific case".
Code
See regex in use here
^(?:(?!(?:\.|^)\d+\.)\S)+\.\d+\.(?:(?!\.\d+(?:\.|$))\S)+$
Results
Input
3.AAA
AAA.BBB
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.4
AAA.3.BBB.4.CCC
AAA.3.BBB.CCC
AAA.3.BBB.CCC.4
AAA.3.BBB.CCC.4.DDD
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.4
ZZZ.AAA.3.BBB.4.CCC
ZZZ.AAA.3.BBB.CCC
ZZZ.AAA.3.BBB.CCC.4
ZZZ.AAA.3.BBB.CCC.4.DDD
Output
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC
Explanation
^ Assert position at the start of the line
(?:(?!(?:\.|^)\d+\.)\S)+ Match the following one or more times
(?!(?:\.|^)\d+\.) Negative lookahead ensuring what follows doesn't match
(?:\.|^) Match either of the following
\. Match a literal dot character .
^ Assert position at the start of the line
\d+ Match one or more digits
\. Match a literal dot character .
\S Match any non-whitespace character
\. Match a literal dot character .
\d+ Match one or more digits
\. Match a literal dot chracter .
(?:(?!\.\d+(?:\.|$))\S)+ Match the following one or more times
(?!\.\d+(?:\.|$)) Negative lookahead ensuring what follows doesn't match
\. Match a literal dot chracter .
\d+ Match one or more digits
(?:\.|$) Match either of the following
\. Match a literal dot chracter .
$ Assert position at the end of the line
\S Match any non-whitespace character
$ Assert position at the end of the line
There is a bit simpler solution:
^(?:(?!\d+\.)\w+\.)+\d+(?:\.(?!\d+(?=\.|$))\w+)+$
See the .NET regex demo (since it is a multiline demo, \r? has to be added before $, it is not necessary when matching standalone strings).
Details
^ - start of string
(?:(?!\d+\.)\w+\.)+ - 1 or more occurrences (due to (?:...)+) of any 1+ word chars (letters, digits, _ - due to \w+) that are not all digits followed with a dot (note that to match only letters and digits, you need to use [\w-[_]] or [^\W_] instead of \w, or if you are really after matching the blocks that may even have symbols or punctuation, replace \w with [^\s.] - any char but whitespace or dot)
\d+ - 1 or more digits
(?:\.(?!\d+(?=\.|$))\w+)+ - 1 or more occurrences of
\. - a dot
(?!\d+(?=\.|$)) - not followed with 1+ digits (\d+) followed with a dot or end of string
\w+ - 1 or more word chars
$ - end of string.
C# demo:
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var lst = new List<string> {"3.AAA", "AAA.BBB", "AAA.3.BBB", "AAA.3.B555B", "AAA.3.BBB.4",
"AAA.3.BBB.4.CCC", "AAA.3.BBB.CCC", "AAA.3.BBB.CCC.4", "AAA.3.BBB.CCC.4.DDD",
"ZZZ.AAA.3.BBB","ZZZ.AAA.3.BBB.4","ZZZ.AAA.3.BBB.4.CCC", "ZZZ.AAA.3.BBB.CCC",
"ZZZ.AAA.3.BBB.CCC.4", "ZZZ.AAA.3.BBB.CCC.4.DDD"};
var rx = new Regex(#"^(?:(?!\d+\.)[^\s.]+\.)+\d+(?:\.(?!\d+(?=\.|$))[^\s.]+)+$",
RegexOptions.Compiled | RegexOptions.ECMAScript);
foreach (var s in lst)
{
if (rx.IsMatch(s))
Console.WriteLine(s);
}
}
}
Results:
AAA.3.BBB
AAA.3.B555B
AAA.3.BBB.CCC
ZZZ.AAA.3.BBB
ZZZ.AAA.3.BBB.CCC

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

Regular Expression for no repeating special characters (C#)

I am new to regular expressions and need a regular expression for address, in which user cannot enter repeating special characters such as: ..... or ,,,.../// etc and none of the special characters could be entered more than 5 times in the string.
...,,,....// =>No Match
Street no. 40. hello. =>Match
Thanks in advance!
I have tried this:
([a-zA-Z]+|[\s\,\.\/\-]+|[\d]+)|(\(([\da-zA-Z]|[^)^(]+){1,}\))
It selects all alphanumeric n some special character with no empty brackets.
You can use Negative lookahead construction that asserts what is invalid to match. Its format is (?! ... )
For your case you can try something like this:
This will not match the input string if it has 2 or more consecutive dots, commas or slashes (or any combination of them)
(?!.*[.,\/]{2}) ... rest of the regex
This will not match the input string if it has more than 5 characters 'A'.
(?!(.*A.*){5}) ... rest of the regex
This will match everything except your restrictions. Repplace last part (.*) with your regex.
^(?!.*[.,\/]{2})(?!(.*\..*){5})(?!(.*,.*){5})(?!(.*\/.*){5}).*$
Note: This regex may no be optimized. It may be faster if you use loop to iterate over string characters and count their occurences.
You can use this regex:
^(?![^,./-]*([,./-])\1)(?![^,./-]*([,./-])(?:[^,./-]*\2){4})[ \da-z,./-]+$
In C#:
foundMatch = Regex.IsMatch(yourString, #"^(?![^,./-]*([,./-])\1)(?![^,./-]*([,./-])(?:[^,./-]*\2){4})[ \da-z,./-]+$", RegexOptions.IgnoreCase);
Explanation
The ^ anchor asserts that we are at the beginning of the string
The negative lookahead (?![^,./-]*([,./-])\1) asserts that it is not possible to match any number of special chars, followed by one special char (captured to Group 1) followed by the same special char (the \1 backreference)
The negative lookahead (?![^,./-]*([,./-])(?:[^,./-]*\2){4}) ` asserts that it is not possible to match any number of special chars, followed by one special char (captured to Group 2), then any non-special char and that same char from Group 2, four times (five times total)
The $ anchor asserts that we are at the end of the string
A regular expression string to detect invalid strings is:
[^\w \-\r\n]{2}|(?:[\w \-]+[^\w \-\r\n]){5}
As C# string literal (regular and verbatim):
"[^\\w \\-\\r\\n]{2}|(?:[\\w \\-]+[^\\w \\-\\r\\n]){5}"
#"[^\w \-\r\n]{2}|(?:[\w \-]+[^\w \-\r\n]){5}"
It is much easier to find a string than to validate if a string does not contain ...
It can be checked with this expression if the string entered by the user is invalid because of a match of 2 special characters in sequence OR 5 special characters used in the string.
Explanation:
[^...] ... a negative character class definition which matches any character NOT being one of the characters listed within the square brackets.
\w ... a word character which is either a letter, a digit or an underscore.
The next character is simply a space character.
\- ... the hyphen character which must be escaped with a backslash within square brackets as otherwise the hyphen character would be interpreted as "FROM x TO z" (except when being the first or the last character within the square brackets).
\r ... carriage return
\n ... line-feed
Therefore [^\w \-\r\n] finds a character which is NOT a letter, NOT a digit, NOT an underscore, NOT a space, NOT a hyphen, NOT a carriage return and also NOT a line-feed.
{2} ... the preceding expression must match 2 such characters.
So with the expression [^\w \-\r\n]{2} it can be checked if the string contains 2 special characters in a sequence which makes the string invalid.
| ... OR
(?:...) ... none marking group needed here for applying the expression inside with the multiplier {5} at least 5 times.
[...] ... a positive character class definition which matches any character being one of the characters listed within the square brackets.
[\w \-]+ ... find a word character, or a space, or a hyphen 1 or more times.
[^\w \-\r\n] ... and next character being NOT a word character, space, hyphen, carriage return or line-feed.
Therefore (?:[\w \-]+[^\w \-\r\n]){5} finds a string with 5 "special" characters between "standard" characters.

Regex expression to match whole word with special characters not working ? [duplicate]

This question already has an answer here:
Regex expression to match whole word ?
(1 answer)
Closed 4 years ago.
I was going through this question
C#, Regex.Match whole words
It says for match whole word use "\bpattern\b"
This works fine for match whole word without any special characters since it is meant for word characters only!
I need an expression to match words with special characters also. My code is as follows
class Program
{
static void Main(string[] args)
{
string str = Regex.Escape("Hi temp% dkfsfdf hi");
string pattern = Regex.Escape("temp%");
var matches = Regex.Matches(str, "\\b" + pattern + "\\b" , RegexOptions.IgnoreCase);
int count = matches.Count;
}
}
But it fails because of %. Do we have any workaround for this?
There can be other special characters like 'space','(',')', etc
If you have non-word characters then you cannot use \b. You can use the following
#"(?<=^|\s)" + pattern + #"(?=\s|$)"
Edit: As Tim mentioned in comments, your regex is failing precisely because \b fails to match the boundary between % and the white-space next to it because both of them are non-word characters. \b matches only the boundary between word character and a non-word character.
See more on word boundaries here.
Explanation
#"
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
# Match either the regular expression below (attempting the next alternative only if this one fails)
^ # Assert position at the beginning of the string
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
)
temp% # Match the characters “temp%” literally
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
If the pattern can contain characters that are special to Regex, run it through Regex.Escape first.
This you did, but do not escape the string that you search through - you don't need that.
output = Regex.Replace(output, "(?<!\w)-\w+", "")
output = Regex.Replace(output, " -"".*?""", "")

Categories

Resources