What is any character (including new line) pattern in regex? - c#

Does regex have a pattern that match any characters including new line in regex? The dot pattern match any characters but isn't including new line, (currently, I'm using [^~] because the ~ character is rarely use).
Edit: I'm using regex with C# language.

Using #C, you can use the RegexOptions.Singleline compiler flag.
Use single-line mode, where (.) matches every character (instead of every character except \n)
And instead of the RegexOptions.Singleline compiler flag, you can get the same effect by placing an inline modifier at the very beginning of your regular expression.
Regex.Match(input, #"(?s)foo.*bar");

I am not familiar with C#, but I am sure regex works the identically everywhere.
.* Matches any character, right, excepts for \n. A simple way to surpass this is eighter by using capture groups: (.|\n)*; (.|\n|\r)*, or you can surpass this limitation by using [\s\S], where \s is any whitespace, and \S is any non white space. I believe in some languages [^] will work as well, but don't know about C#. Basically it says do not match nothing, so it will match anything.

Related

Regular Expression Space character not working

My Regex is for a canadian postal code and only allowing the valid letters:
Regex pattern = new Regex("^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ][/s][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$");
The problem I am having is that I want to allow for a space to be put in between the each set but cannot find the correct character to use.
You've got a forward-slash instead of a backslash in your regular expression for whitespace (\s). The following regex should work.
#"^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ][\s][0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$"
If you are simply searching for space use \s
To provide the escape sequence character \ use # verbitm literal character as below in the given example.
Regex pattern = new Regex(#"^[ABCEGHJKLMNPRSTVXY][0-9]\s[ABCEGHJKLMNPRSTVWXYZ[0-9]\s[ABCEGHJKLMNPRSTVWXYZ][0-9]$");
As pointed out in the comments, if space is optional you can use ? quantifier as below.
Regex pattern = new Regex(#"^[ABCEGHJKLMNPRSTVXY][0-9]\s?[ABCEGHJKLMNPRSTVWXYZ[0-9]\s?[ABCEGHJKLMNPRSTVWXYZ][0-9]$");
Use the \s token for whitespace instead of /s.
Some handy tools to speed up regex development:
regexr.com helps with syntax and provides realtime testing
regexpr.com (yes I know :)) visualizes your expression.
As per other answers....
Use \s instead of /s
You shouldn't need to square bracket the [\s], because it already implies a complete class of characters.
Also...
In most languages, you probably don't want to use double quotes "..." as delimiters to the Regex, since this might be interpolating the \s before the pattern is applied. It's certainly worth a try.
Use a trailing quantifier \s* or \s? to allow the space to be optional.

Regex \b with words starting with special characters

I'm having some difficulties with the regex boundary \b character. I need to search for an exact keyword inside some loaded text(either plain textual data or Xml). Because the need for exact matches I use the \bkeyword\b pattern but I get a different behaviour than what I expect when the keyword starts with a special character. For example the pattern \b€ 3,5\b doesn't match in I have € 3,5 to spend!. This is the case with any special characters.
I've search around but came up with no solution. Is there some mechanism that acts like a \b but for special characters ? Also note that i cannot alter the keyword.
Any help would be appreciated.
You can perhaps make use of a positive lookbehind:
(?<=^|\s)€ 3,5\b
The positive lookbehind will match either the beginning of the string or a \s, without including them in the match itself.

Regex to find anchor tag consist of new line in c# .net

I want to find the href from an achore tag. So I have used regex as
<a\s*[^>]*\s*href\s*\=\s*([^(\s*|\>)]*)\s*[^>]*>\s*Text\s*<\/a>
Options = Ignorecase + singleline
Example
Text
So Group[1]="/abc/xzy/pqr.com"
But If the content is like
<a href="/abc/xzy/ //Contains new line
pqr.com" class="m">Text</a>
so Group[1]="/abc/xzy/
So I want to know how to get "/abc/xzy/pqr.com" if the content contains new line(\r\n)
Your capture group is a bit weird: [^(\s*|\>)]* is a character class and it will match any character not (, ror a character class \s, nor an asterisk *, etc.
What you can do however is to put quotes before and after the capture group:
<a\s*[^>]*\s*href\s*\=\s*"([^(\s*|\>)]*)"\s*[^>]*>\s*Text\s*<\/a>
^ ^
And then change the character class to [^"] (not quotes):
<a\s*[^>]*\s*href\s*\=\s*"([^"]*)"\s*[^>]*>\s*Text\s*<\/a>
^^^^
regex101 demo.
This said, it would be better to use a proper html parser instead of regex. It's just that it's more tedious to make a suitable regex because you can forget about a lot of different scenarios, but if you're certain of how your data comes through, regex might be a quick way to get what you need.
If you want to consider single quotes and no quotes at all in some cases, you might try this instead:
<a\s*[^>]*\s*href\s*=\s*((?:[^ ]|[\n\r])+)\s*[^>]*>\s*Text\s*<\/a>
Updated regex101.
This regex has this part instead (?:[^ ]|[\n\r])+ which accepts non-spaces and newlines (and carriage returns just in case). Note that \s contains white spaces, tabs, newlines and form-feed.

Regex Expressions for all non alphanumeric symbols

I am trying to make a regular expression for a string that has at least 1 non alphanumeric symbol in it
The code I am trying to use is
Regex symbolPattern = new Regex("?[!##$%^&*()_-+=[{]};:<>|./?.]");
I'm trying to match only one of !##$%^&*()_-+=[{]};:<>|./?. but it doesn't seem to be working.
If you want to match non-alphanumeric symbols then just use \W|_.
Regex pattern = new Regex(#"\W|_");
This will match anything except 0-9 and a-z. Information on the \W character class and others available here (c# Regex Cheet Sheet).
https://www.mikesdotnetting.com/article/46/c-regular-expressions-cheat-sheet
You could also avoid regular expressions if you want:
return s.Any(c => !char.IsLetterOrDigit(c))
Can you check for the opposite condition?
Match match = Regex.Match(#"^([a-zA-Z0-9]+)$");
if (!match.Success) {
// it's alphanumeric
} else {
// it has one of those characters in it.
}
I didn't get your entire question, but this regex will match those strings that contains at least one non alphanumeric character. That includes whitespace (couldn't see that in your list though)
[^\w]+
Your regex just needs little tweaking. The hyphen is used to form ranges like A-Z, so if you want to match a literal hyphen, you either have to escape it with a backslash or move it to the end of the list. You also need to escape the square brackets because they're the delimiters for character class. Then get rid of that question mark at the beginning and you're in business.
Regex symbolPattern = new Regex(#"[!##$%^&*()_+=\[{\]};:<>|./?,-]");
If you only want to match ASCII punctuation characters, this is probably the simplest way. \W matches whitespace and control characters in addition to punctuation, and it matches them from the entire Unicode range, not just ASCII.
You seem to be missing a few characters, though: the backslash, apostrophe and quotation mark. Adding those gives you:
#"[!##$%^&*()_+=\[{\]};:<>|./?,\\'""-]"
Finally, it's a good idea to always use C#'s verbatim string literals (#"...") for regexes; it saves you a lot of hassle with backslashes. Quotation marks are escaped by doubling them.

Regular Expression for alphanumeric and space

What is the regular exp for a text that can't contain any special characters except space?
Because Prajeesh only wants to match spaces, \s will not suffice as it matches all whitespace characters including line breaks and tabs.
A character set that should universally work across all RegEx parsers is:
[a-zA-Z0-9 ]
Further control depends on your needs. Word boundaries, multi-line support, etc... I would recommend visiting Regex Library which also has some links to various tutorials on how Regular Expression Parsing works.
[\w\s]*
\w will match [A-Za-z0-9_] and the \s will match whitespaces.
[\w ]* should match what you want.
Assuming "special characters" means anything that's not a letter or digit, and "space" means the space character (ASCII 32):
^[A-Za-z0-9 ]+$
You need #"^[A-Za-z0-9 ]+$". The \s character class matches things other than space (such as tab) and you since you want to match sure that no part of the string has other characters you should anchor it with ^ and $.
If you just want alphabets and spaces then you can use: #"[A-Za-z\s]+" to match at least one character or space. You could also use #"[A-Za-z ]+" instead without explicitly denoting the space.
Otherwise please clarify.
In C#, I'd believe it's ^(\w|\s)*$

Categories

Resources