I have a string and what to
remove all characters except all english letters (a..z)
replace all whitespaces sequences with a single whitespace
How would you do that with C# 3.0 ?
Regex (edited)?
string s = "lsg #~A\tSd 2£R3 ad"; // note tab
s = Regex.Replace(s, #"\s+", " ");
s = Regex.Replace(s, #"[^a-zA-Z ]", ""); // "lsg A Sd R ad"
Of course the Regex solution is the best one (i think).
But someone HAS to do it in LINQ, so i had some fun. There you go:
bool inWhiteSpace = false;
string test = "lsg #~A\tSd 2£R3 ad";
var chars = test.Where(c => ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || char.IsWhiteSpace(c))
.Select(c => {
c = char.IsWhiteSpace(c) ? inWhiteSpace ? char.MinValue : ' ' : c;
inWhiteSpace = c == ' ' || c == char.MinValue;
return c;
})
.Where(c => c != char.MinValue);
string result = new string(chars.ToArray());
Using regular expressions of course!
string myCleanString = Regex.Replace(stringToCleanUp, #"[\W]", "");
string myCleanString = Regex.Replace(stringToCleanUp, #"[^a-zA-Z0-9]", "");
I think you can do this with regular expression .What Marc and boekwurm mentioned.
Try these links also http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
note : [a-z] :A range of characters. Matches any character in the specified
range. For example, “[a-z]” matches any lowercase alphabetic
character in the range “a” through “z”.
Regular expressions also provide special characters to represent common character
ranges. You could use “[0-9]” to match any numeric digit, or you can use “\d”. Similarly,
“\D” matches any non-numeric digit. Use “\s” to match any white-space character,
and use “\S” to match any non-white-space character.
Related
I'm trying to write a method that replaces all occurrences of the characters in the input array (charsToReplace) with the replacementCharacter using regex. The version I have written does not work if the array contains any characters that may change the meaning of the regex pattern, such as ']' or '^'.
public static string ReplaceAll(string str, char[] charsToReplace, char replacementCharacter)
{
if(str.IsNullOrEmpty())
{
return string.Empty;
}
var pattern = $"[{new string(charsToReplace)}]";
return Regex.Replace(str, pattern, replacementCharacter.ToString());
}
So ReplaceAll("/]a", {'/', ']' }, 'a') should return "aaa".
Inside a character class, only 4 chars require escaping, ^, -, ] and \. You can't use Regex.Escape because it does not escape -and ] as they are not "special" outside a character class. Note that Regex.Escape is meant to be used only for literal char (sequences) that are outside character classes.
An unescaped ] char will close your character class prematurely and that is the main reason why your code does not work.
So, the fixed pattern variable definition can look like
var pattern = $"[{string.Concat(charsToReplace).Replace(#"\", #"\\").Replace("-", #"\-").Replace("^", #"\^").Replace("]", #"\]")}]";
See an online C# demo.
I suggest using Linq, not regular expresions:
using System.Linq;
...
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter)
{
// Please, note IsNullOrEmpty syntax
// we should validate charsToReplace as well
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str; // let's just do nothing (say, not turn null into empty string)
return string.Concat(str.Select(c => charsToReplace.Contains(c)
? replacementCharacter
: c));
}
If you insist on Regex (note, that we should Regex.Escape chars within charsToReplace). However, according to the manual Regex.Escape doesn't escape - and [ which have special meaning within regular expression brackets.
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter) {
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str;
string charsSet = string.Concat(charsToReplace
.Select(c => new char[] { ']', '-' }.Contains(c) // in case of '-' and ']'
? $#"\{c}" // escape them as well
: Regex.Escape(c.ToString())));
return Regex.Replace(
str,
$"[{charsSet}]+",
m => new string(replacementCharacter, m.Length));
}
I have string like this:
test- qweqw (Barcelona - Bayer) - testestsetset
And i need to capture Bayer word.I tried this regex expression ( between "-" and ")" )
(?<=-)(.*)(?=\))
Example: https://regex101.com/r/wI9zD0/2
As you see it worked a bit incorrect.What should i fix?
Here's a different regex to do what you are looking for:
-\s([^()]+)\)
https://regex101.com/r/wI9zD0/3
You don't need regex for that, you can use LINQ:
string input = "test - qweqw(Barcelona - Bayer) - testestsetset";
string res = String.Join("", input.SkipWhile(c => c != '(')
.SkipWhile(c => c != '-').Skip(1)
.TakeWhile(c => c != ')'))
.Trim();
Console.WriteLine(res); // Bayer
What I want to do is take a string like the following
This is my string and *this text* should be wrapped with <strong></strong>
The result should be
This is my string and this text should be wrapped with
This seems to work pretty well:
var str = "This is my string and *this text* should be wrapped with";
var updatedstr = String.Concat(
Regex.Split(str, #"\*")
.Select((p, i) => i % 2 == 0 ? p :
string.Concat("<strong>", p, "</strong>"))
.ToArray()
);
What about this:
string s = "This is my string and *this text* should be wrapped with <strong></strong>";
int i = 0;
while (s.IndexOf('*') > -1)
{
string tag = i % 0 == 0 ? "<strong>" : "</strong>";
s = s.Substring(0, s.indexOf('*')) + tag + s.Substring(s.indexOf('*')+1);
++i;
}
Or Marty Wallace's regex idea in the comments on the question, \*[^*]+?\*
You can use a very simple regex for this case:
var text = "";
text = Regex.Replace(text, #"\*([^*]*)\*", "<b>$1</b>");
See the .NET regex demo. Here, \*([^*]*)\ matches
\* - a literal asterisk (* is a special regex metacharacter and needs escaping in the literal meaning)
([^*]*) - Group 1: zero or more chars other than a * char
\* - a * char.
The $1 in the replacement pattern refers to the value captured in Group 2.
Demo screen:
I have a string like "AAA 101 B202 C 303 " and I want to get rid of the space between char and number if there is any.
So after operation, the string should be like "AAA101 B202 C303 ". But I am not sure whether regex could do this?
Any help? Thanks in advance.
Yes, you can do this with regular expressions. Here's a short but complete example:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
string text = "A 101 B202 C 303 ";
string output = Regex.Replace(text, #"(\p{L}) (\d)", #"$1$2");
Console.WriteLine(output); // Prints A101 B202 C303
}
}
(If you're going to do this a lot, you may well want to compile a regular expression for the pattern.)
The \p{L} matches any unicode letter - you may want to be more restrictive.
You can do something like
([A-Z]+)\s?(\d+)
And replace with
$1$2
The expression can be tightened up, but the above should work for your example input string.
What it does is declaring a group containing letters (first set of parantheses), then an optional space (\s?), and then a group of digits (\d+). The groups can be used in the replacement by referring to their index, so when you want to get rid of the space, just replace with $1$2.
While not as concise as Regex, the C# code for something like this is fairly straightforward and very fast-running:
StringBuilder sb = new StringBuilder();
for(int i=0; i<s.Length; i++)
{
// exclude spaces preceeded by a letter and succeeded by a number
if(!(s[i] == ' '
&& i-1 >= 0 && IsLetter(s[i-1])
&& i+1 < s.Length && IsNumber(s[i+1])))
{
sb.Append(s[i]);
}
}
return sb.ToString();
Just for fun (because the act of programming is/should be fun sometimes) :o) I'm using LINQ with Aggregate:
var result = text.Aggregate(
string.Empty,
(acc, c) => char.IsLetter(acc.LastOrDefault()) && Char.IsDigit(c) ?
acc + c.ToString() : acc + (char.IsWhiteSpace(c) && char.IsLetter(acc.LastOrDefault()) ?
string.Empty : c.ToString())).TrimEnd();
I read a string from the console. How do I make sure it only contains English characters and digits?
Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$
For example:
if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }
The benefit of regular expressions in this case is that all you really care about is whether or not a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.
There's a decent series of articles here that teach more about regular expressions.
Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.
You could match it against this regular expression: ^[a-zA-Z0-9]*$
^ matches the start of the string (ie no characters are allowed before this point)
[a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
* lets the previous match repeat zero or more times
$ matches the end of the string (ie no characters are allowed after this point)
To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:
bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");
If you are going to test a lot of lines against the pattern, you might want to compile the expression:
Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);
for (int i = 0; i < 1000; i++)
{
string input = Console.ReadLine();
pattern.IsMatch(input);
}
The accepted answer does not work for the white spaces or punctuation. Below code is tested for this input:
Hello: 1. - a; b/c \ _(5)??
(Is English)
Regex regex = new Regex("^[a-zA-Z0-9. -_?]*$");
string text1 = "سلام";
bool fls = regex.IsMatch(text1); //false
string text2 = "123 abc! ?? -_)(/\\;:";
bool tru = regex.IsMatch(text2); //true
One other way is to check if IsLower and IsUpper both doesn't return true.
Something like :
private bool IsAllCharEnglish(string Input)
{
foreach (var item in Input.ToCharArray())
{
if (!char.IsLower(item) && !char.IsUpper(item) && !char.IsDigit(item) && !char.IsWhiteSpace(item))
{
return false;
}
}
return true;
}
and for use it :
string str = "فارسی abc";
IsAllCharEnglish(str); // return false
str = "These are english 123";
IsAllCharEnglish(str); // return true
Do not use RegEx and LINQ they are slower than the loop by characters of string
Performance test
My solution:
private static bool is_only_eng_letters_and_digits(string str)
{
foreach (char ch in str)
{
if (!(ch >= 'A' && ch <= 'Z') && !(ch >= 'a' && ch <= 'z') && !(ch >= '0' && ch <= '9'))
{
return false;
}
}
return true;
}
do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it.
google language api
bool onlyEnglishCharacters = !EnglishText.Any(a => a > '~');
Seems cheap, but it worked for me, legit easy answer.
Hope it helps anyone.
bool AllAscii(string str)
{
return !str.Any(c => !Char.IsLetterOrDigit(c));
}
Something like this (if you want to control input):
static string ReadLettersAndDigits() {
StringBuilder sb = new StringBuilder();
ConsoleKeyInfo keyInfo;
while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
char c = char.ToLower(keyInfo.KeyChar);
if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
sb.Append(keyInfo.KeyChar);
Console.Write(c);
}
}
return sb.ToString();
}
If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):
foreach (char ch in str.ToCharArray())
{
int x = (int)char;
if (x >= 63 and x <= 126)
{
//this is english letter, i.e.- A, B, C, a, b, c...
}
else if(x >= 48 and x <= 57)
{
//this is number
}
else
{
//this is something diffrent
}
}
http://en.wikipedia.org/wiki/ASCII for full ASCII table.
But I still think, RegEx is the best solution.
I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.
More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8
As many pointed out, accepted answer works only if there is a single word in the string. As there are no answers that cover the case of multiple words or even sentences in the string, here is the code:
stringToCheck.Any(x=> char.IsLetter(x) && !((int)x >= 63 && (int)x <= 126));
<?php
$string="हिन्दी";
$string="Manvendra Rajpurohit";
echo strlen($string); echo '<br>';
echo mb_strlen($string, 'utf-8');
echo '<br>';
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else {
echo "OK, English Detected!";
}
?>