Disallow whitespaces in regex

Disallow whitespaces in regex - c#

I have this code:
private static bool IsTextAllowed(string text)
{
Regex regex = new Regex("[^0-9]+"); // Regex that matches disallowed text
return !regex.IsMatch(text);
}
private void TextboxClientID_PreviewTextInput(object sender, TextCompositionEventArgs e)
{
e.Handled = !IsTextAllowed(e.Text);
}
This allows whitespaces in the textbox, how to prevent from inserting whitespaces too?

In regex, the \s modifier translates to [\r\n\t\f ], which means no newline characters, no tab characters, no form feed characters (used by printers to start a new page), and no spaces.
So you can use the regex [^\\s] (you have to use \\ in order to make a single \, which will then translate to \s finally. If you just use \s, it will translate to s character literal.
The beginning and ending ^ and $ characters match the beginning and end of the string respectively.
So, you could use the regex ^[^0-9\\s]+$. Here is a breakdown of what it does:
The first ^ matches the beginning of the string.
Next, we have the group inclosed in [], which will match any single character in that group
Inside of the [], we have ^0-9\\s:
The ^ character makes sure that no single character inside of the [] will be matched (switches it from any single character to no single character), none of the following should be true
The 0-9 part matches any number between 0 and 9
The \\s part creates literally \s. \s matches any whitespace character
The + matches the group inclosed in [] between 1 and infinite times
The final $ matches the end of the string
Your code could be:
private static bool IsTextAllowed(string text){
Regex regex = new Regex("^[^0-9\\s]+$");
return !regex.IsMatch(text);
}
Here's a regex101 test: https://regex101.com/r/aS9xT0

^[^0-9 ]+$
Try this.This will not allow whitespaces at all.

*NOT a regex answer, I had the same issue with the space char's
I fixed this by adding PreviewKeyDown on the TextBox and setting e.Handled to true if spacebar has been pressed, just like this:
private void TextBox_PreviewKeyDown(object sender, KeyEventArgs e)
{
e.Handled = e.Key == Key.Space;
}

Use:
#"^[^\d\s]+$"
\d ... Match a digit (0-9).
\s ... Match a whitespace character.
private static bool IsTextAllowed(string text)
{
return Regex.IsMatch(text, #"^[^\d\s]+$");
}

Why do you bother with regex ? Simply use:
private static bool IsTextAllowed(string text)
{
return text.All(c => !char.IsWhiteSpace(c));
}

Old question.. But to disallow whitespaces use the "Any non-whitespace character" \S coupled with a multiplier +. It will match basically anything, even '∆', but no whitespaces (\n, \r, \t, \f, \v)
\S+
But a number input without whitespace can simply be, as #vks mentioned;
^[0-9]+$

Related

Find pipe in quotes ignore false positives [duplicate]

This question already has answers here:
Need C# Regex for replacing spaces inside of strings
(2 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 3 years ago.
I'm trying to replace pipe delimited character inside quotes with a space. The issue is I get to many false positives because some strings are null. I only want to replace the pipe if there is text between the quotes. The regex pattern I'm using is from another stackoverflow post as my regex skills are lacking.
data sample:
"Hello"|"Green | Blue"|123.45|""|""|""|5|45
code i'm using:
internal class Program
{
public static void Main()
{
string pattern = #"(?: (?<= "")|\G(?!^))(\s*[^"" |\s]+(?:\s +[^
""|\s]+)*)\s*\|\s*(?=[^""] * "")";
string substitution = #"\1 \2";
string input = #"""20190430|""Test Text""|""""|""""|""Manual""|""""|""Machine""|""""|""""|10.00|""""|0.00|||0.00||5600.00||||""A+""|""""|40.00||""""|""Vision Service |Troubleshoot""|57|""Y""|838|""Yellow Maroon""|850||""FL""||||0.00|||||||||||""""||""""||""""|||""""||||||""""||""""|""""||""""|""""||||||""""|""""|""""||||||||1||""";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine("Result:" + result);
Console.ReadKey();
}
}
It replaces the 'Blue Green' pipe just fine. But it also replaces the pipes between quotes later which breaks the file as column get removed.
Updated the code with an actual sample of my file I'm processing. The regex finds it but doesn't replace the pipe. Missing something.

If there should be text between the double quotes and the text should be on both sides of the pipe, you might use:
(?<=")(\s*[^"\s|]+)\s*\|\s*([^\s"|]+\s*)(?=")
In the replacement use $1 $2
Explanation
(?<=") Postive lookbehind, assert what is on the left is "
(\s*[^"\s|]+) Capture in group 1 matching 0+ times a whitespace char, 1+ times not ", | or a whitespace char
\s*\|\s* Match a | between 0+ times a whitespace char
([^\s"|]+\s*) Capture in group 2 matching 1+ times not ", | or a whitespace char and match 0+ times a whitespace char
(?=") Positive lookahead, assert what is on the right is "
.NET Regex demo
Edit
If you want to replace multiple pipes with a space between the double quotes you could make use of the \G anchor to assert the position at the end of previous match.
In the replacement use the first capturing group followed by a space $1
(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")
Explanation
(?: Non capturing group
(?<=") Assert what is on the left is "
| Or
\G(?!^) Assert position at the end of the previous match
) Close non capturing group
( Capure group 1
\s*[^"|\s]+ Match 0+ times a whitespace char, followed by 1+ times not a | or whitespace char
(?:\s+[^"|\s]+)* Repeat 0+ times matching 1+ whitespace chars followed by 1+ times not a | or whitespace char
) Close capturing group 1
\s*\|\s* Match a | between 0+ times a whitespace char
(?=[^"]*") Assert what is on the right is a "
See another .NET regex demo

My guess is that, we might also want to keep only one space in our text, and this expression,
"([^"]+?)\s+\|\s+([^"]+?)"
with a replacement of $1 $2 might work.
Demo
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"""([^""]+?)\s+\|\s+([^""]+?)""";
string substitution = #"\1 \2";
string input = #"""Hello""|""Green | Blue""|123.45|""""|""""|""""|5|45";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}

Regex & C#: Replace all Special Characters except Emojis

I need to replace all special characters in a string except the following (which includes alphabetic characters):
:)
:P
;)
:D
:(
This is what I have now:
string input = "Hi there!!! :)";
string output = Regex.Replace(input, "[^0-9a-zA-Z]+", "");
This replaces all special characters. How can I modify this to not replace mentioned characters (emojis) but replace any other special character?

You may use a known technique: match and capture what you need and match only what you want to remove, and replace with the backreference to Group 1:
(:(?:[D()P])|;\))|[^0-9a-zA-Z\s]
Replace with $1. Note I added \s to the character class, but in case you do not need spaces, remove it.
See the regex demo
Pattern explanation:
(:(?:[D()P])|;\)) - Group 1 (what we need to keep):
:(?:[D()P]) - a : followed with either D, (, ) or P
| - or
;\) - a ;) substring
(here, you may extend the capture group with more |-separated branches).
| - or ...
[^0-9a-zA-Z\s] - match any char other than ASCII digits, letters (and whitespace, but as I mentioned, you may remove \s if you do not need to keep spaces).

I would use a RegEx to match all emojis and select them out of the text
string input = "Hi there!!! :)";
string output = string.Concat(Regex.Matches(input, "[;|:][D|P|)|(]+").Cast<Match>().Select(x => x.Value));
Pattern [;|:][D|P|)|(]+
[;|:] starts with : or ;
[D|P|)|(] ends with D, P, ) or (
+ one or more

Trim Non-alphanum from beginning and end of string

what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.
I tried using this function:
string something = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
string somethingNew = Regex.Replace(something, #"[^\p{L}-\s]+", "");
But it removes all characters that are non alpha numeric from the string. What I basically want is like this:
"test1" -> test1
#!#!2test# -> 2test
(test3) -> test3
##test4---- -> test4
I do want to support unicode characters but not symbols..
EDIT:
The output of the example should be:
Littering aaaannnndóú
Regards

Assuming you want to trim non-alphanumeric characters from the start and end of your string:
s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
.TakeWhile(char.IsLetterOrDigit)
.ToArray());

#"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"

You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.
From MSDN : String.Trim Method (Char[])
Removes all leading and trailing occurrences of a set of characters
specified in an array from the current String object.
Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.
you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.
From MSDN: Char.IsLetterOrDigit()
Indicates whether a Unicode character is categorized as a letter or a
decimal digit.
Try This:
string str = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
if (!char.IsLetterOrDigit(ch))
str = str.Trim(ch);
}
Output:
1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9

If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.
string result = string.Concat(input.Where(char.IsLetterOrDigit));
which you can easily convert into an extension method
public static class Extensions
{
public static string ToAlphaNum(this string input)
{
return string.Concat(input.Where(char.IsLetterOrDigit));
}
}
that you can use like this :
string testString = "#!#!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123
Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.

And you could also replace all the non-letters/numbers at the beginning and/or end of the line:
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
used as
resultString = Regex.Replace(subjectString, #"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);
If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)
If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:
^\W+|\W+$
The core of the regex:
[^\p{L}\p{N}]
is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}
In other words:
Trim non-unicode alphanumeric characters
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture
Match this alternative «^[^\p{L}\p{N}]*»
Assert position at the beginning of a line «^»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Assert position at the end of a line «$»
Created with RegexBuddy

Without using regex:
In Java, you could do: (in c# syntax would be nearly the same with same functionality)
while (true) {
if (word.length() == 0) {
return ""; // bad
}
if (!Character.isLetter(word.charAt(0))) {
word = word.substring(1);
continue; // so we are doing front first
}
if (!Character.isLetter(word.charAt(word.length()-1))) {
word = word.substring(0, word.length()-1);
continue; // then we are doing end
}
break; // if front is done, and end is done
}

you could use this pattern
^[^[:alnum:]]+|[^[:alnum:]]+$
with g option
Demo

Replace special character with white space through regex

I have a function which replace character.
public static string Replace(string value)
{
value = Regex.Replace(value, "[\n\r\t]", " ");
return value;
}
value="abc\nbcd abcd abcd\ " if in string there is any unwanted white space they are also remove.Means I want result like this
value="abcabcdabcd".Help to change Regex Pattern to get desire result.Thanks a lot.

If you need to remove any number of whitespace characters from the string, probably you're looking for something like this:
value = Regex.Replace(value, #"\s+", "");
where \s matches any whitespace character and + means one or more times.

Instead of replacing your newline, tab, etc. characters with a space, just replace all whitespace with nothing:
public static string RemoveWhitespace(string value)
{
return Regex.Replace(value, "\\s", "");
}
\s is a special character group that matches all whitespace characters. (The backslash is doubled because the backslash has a special meaning in C# strings as well.) The following MSDN link contains the exact definition of that character group:
Character Classes: White-Space Character: \s

You may want to try \s indicating white spaces. With the statement Regex.Replace(value, #"\s", ""), the output will be "abcabcdabcd".

How can I remove quoted string literals from a string in C#?

I have a string:
Hello "quoted string" and 'tricky"stuff' world
and want to get the string minus the quoted parts back. E.g.,
Hello and world
Any suggestions?

resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
. # match any character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will work on your example.
resultString = Regex.Replace(subjectString,
#"([""'])# Match a quote, remember which one
(?: # Then...
(?!\1) # (as long as the next character is not the same quote as before)
\\?. # match any escaped or unescaped character
)* # any number of times
\1 # until the corresponding closing quote
\s* # plus optional whitespace
",
"", RegexOptions.IgnorePatternWhitespace);
will also handle escaped quotes.
So it will correctly transform
Hello "quoted \"string\\" and 'tricky"stuff' world
into
Hello and world

Use a regular expression to match any quoted strings with the string and replace them with the empty string. Use the Regex.Replace() method to do the pattern matching and replacement.

In case, like me, you're afraid of regex, I've put together a functional way to do it, based on your example string. There's probably a way to make the code shorter, but I haven't found it yet.
private static string RemoveQuotes(IEnumerable<char> input)
{
string part = new string(input.TakeWhile(c => c != '"' && c != '\'').ToArray());
var rest = input.SkipWhile(c => c != '"' && c != '\'');
if(string.IsNullOrEmpty(new string(rest.ToArray())))
return part;
char delim = rest.First();
var afterIgnore = rest.Skip(1).SkipWhile(c => c != delim).Skip(1);
StringBuilder full = new StringBuilder(part);
return full.Append(RemoveQuotes(afterIgnore)).ToString();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Disallow whitespaces in regex - c#

^[^0-9 ]+$ Try this.This will not allow whitespaces at all.

*NOT a regex answer, I had the same issue with the space char's I fixed this by adding PreviewKeyDown on the TextBox and setting e.Handled to true if spacebar has been pressed, just like this: private void TextBox_PreviewKeyDown(object sender, KeyEventArgs e) { e.Handled = e.Key == Key.Space; }

Use: #"^[^\d\s]+$" \d ... Match a digit (0-9). \s ... Match a whitespace character. private static bool IsTextAllowed(string text) { return Regex.IsMatch(text, #"^[^\d\s]+$"); }

Why do you bother with regex ? Simply use: private static bool IsTextAllowed(string text) { return text.All(c => !char.IsWhiteSpace(c)); }

Old question.. But to disallow whitespaces use the "Any non-whitespace character" \S coupled with a multiplier +. It will match basically anything, even '∆', but no whitespaces (\n, \r, \t, \f, \v) \S+ But a number input without whitespace can simply be, as #vks mentioned; ^[0-9]+$

Related

Find pipe in quotes ignore false positives [duplicate]

Regex & C#: Replace all Special Characters except Emojis

Trim Non-alphanum from beginning and end of string

Replace special character with white space through regex

How can I remove quoted string literals from a string in C#?

Categories

Resources