check content of string input - c#

How can I check if my input is a particular kind of string. So no numeric, no "/",...

Well, to check that an input is actually an object of type System.String, you can simply do:
bool IsString(object value)
{
return value is string;
}
To check that a string contains only letters, you could do something like this:
bool IsAllAlphabetic(string value)
{
foreach (char c in value)
{
if (!char.IsLetter(c))
return false;
}
return true;
}
If you wanted to combine these, you could do so:
bool IsAlphabeticString(object value)
{
string str = value as string;
return str != null && IsAllAlphabetic(str);
}

If you mean "is the string completely letters", you could do:
string myString = "RandomStringOfLetters";
bool allLetters = myString.All( c => Char.IsLetter(c) );
This is based on LINQ and the Char.IsLetter method.

It's not entirely clear what you want, but you can probably do it with a regular expression. For example to check that your string contains only letters in a-z or A-Z you can do this:
string s = "dasglakgsklg";
if (Regex.IsMatch(s, "^[a-z]+$", RegexOptions.IgnoreCase))
{
Console.WriteLine("Only letters in a-z.");
}
else
{
// Not only letters in a-z.
}
If you also want to allow spaces, underscores, or other characters simply add them between the square brackets in the regular expression. Note that some characters have a special meaning inside regular expression character classes and need to be escaped with a backslash.
You can also use \p{L} instead of [a-z] to match any Unicode character that is considered to be a letter, including letters in foreign alphabets.

using System.Linq;
...
bool onlyAlphas = s.All(c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'));

Something like this (have not tested) may fit your (vague) requirement.
if (input is string)
{
// test for legal characters?
string pattern = "^[A-Za-z]+$";
if (Regex.IsMatch(input, pattern))
{
// legal string? do something
}
// or
if (input.Any(c => !char.IsLetter(c)))
{
// NOT legal string
}
}

Related

How to escape '-' (minus) with Regex? C# [duplicate]

I'm trying to write a method that replaces all occurrences of the characters in the input array (charsToReplace) with the replacementCharacter using regex. The version I have written does not work if the array contains any characters that may change the meaning of the regex pattern, such as ']' or '^'.
public static string ReplaceAll(string str, char[] charsToReplace, char replacementCharacter)
{
if(str.IsNullOrEmpty())
{
return string.Empty;
}
var pattern = $"[{new string(charsToReplace)}]";
return Regex.Replace(str, pattern, replacementCharacter.ToString());
}
So ReplaceAll("/]a", {'/', ']' }, 'a') should return "aaa".
Inside a character class, only 4 chars require escaping, ^, -, ] and \. You can't use Regex.Escape because it does not escape -and ] as they are not "special" outside a character class. Note that Regex.Escape is meant to be used only for literal char (sequences) that are outside character classes.
An unescaped ] char will close your character class prematurely and that is the main reason why your code does not work.
So, the fixed pattern variable definition can look like
var pattern = $"[{string.Concat(charsToReplace).Replace(#"\", #"\\").Replace("-", #"\-").Replace("^", #"\^").Replace("]", #"\]")}]";
See an online C# demo.
I suggest using Linq, not regular expresions:
using System.Linq;
...
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter)
{
// Please, note IsNullOrEmpty syntax
// we should validate charsToReplace as well
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str; // let's just do nothing (say, not turn null into empty string)
return string.Concat(str.Select(c => charsToReplace.Contains(c)
? replacementCharacter
: c));
}
If you insist on Regex (note, that we should Regex.Escape chars within charsToReplace). However, according to the manual Regex.Escape doesn't escape - and [ which have special meaning within regular expression brackets.
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter) {
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str;
string charsSet = string.Concat(charsToReplace
.Select(c => new char[] { ']', '-' }.Contains(c) // in case of '-' and ']'
? $#"\{c}" // escape them as well
: Regex.Escape(c.ToString())));
return Regex.Replace(
str,
$"[{charsSet}]+",
m => new string(replacementCharacter, m.Length));
}

Escape character in C#'s Split()

I am parsing some delimiter separated values, where ? is specified as the escape character in case the delimiter appears as part of one of the values.
For instance: if : is the delimiter, and a certain field the value 19:30, this needs to be written as 19?:30.
Currently, I use string[] values = input.Split(':'); in order to get an array of all values, but after learning about this escape character, this won't work anymore.
Is there a way to make Split take escape characters into account? I have checked the overload methods, and there does not seem to be such an option directly.
string[] substrings = Regex.Split("aa:bb:00?:99:zz", #"(?<!\?):");
for
aa
bb
00?:99
zz
Or as you probably want to unescape ?: at some point, replace the sequence in the input with another token, split and replace back.
(This requires the System.Text.RegularExpressions namespace to be used.)
This kind of stuff is always fun to code without using Regex.
The following does the trick with one single caveat: the escape character will always escape, it has no logic to check for only valid ones: ?;. So the string one?two;three??;four?;five will be split into onewo, three?, fourfive.
public static IEnumerable<string> Split(this string text, char separator, char escapeCharacter, bool removeEmptyEntries)
{
string buffer = string.Empty;
bool escape = false;
foreach (var c in text)
{
if (!escape && c == separator)
{
if (!removeEmptyEntries || buffer.Length > 0)
{
yield return buffer;
}
buffer = string.Empty;
}
else
{
if (c == escapeCharacter)
{
escape = !escape;
if (!escape)
{
buffer = string.Concat(buffer, c);
}
}
else
{
if (!escape)
{
buffer = string.Concat(buffer, c);
}
escape = false;
}
}
}
if (buffer.Length != 0)
{
yield return buffer;
}
}
No, there's no way to do that. You will need to use regex (which depends on how exactly do you want your "escape character" to behave). In worst case I suppose you'll have to do the parsing manually.

Regex patterns C#

I want to validate a string in such a manner that in that string, if a "-" is present it should have an alphabet before and after it.
But I am unable to form the regex pattern.
Can anyone please help me for the same.
Rather than using a regex to check this I think I would write an extension method using Char.IsLetter(). You can handle multiple dashes then, and use languages other than English.
public static bool IsValidDashedString(this String text)
{
bool temp = true;
//retrieve the location of all the dashes
var indexes = Enumerable.Range(0, text.Length)
.Where(i => text[i] == '-')
.ToList();
//check if any dashes occur, if they are the 1st character or the last character
if (indexes.Count() == 0 ||
indexes.Any(i => i == 0) ||
indexes.Any(i => i == text.Length-1))
{
temp = false;
}
else //check if each dash is preceeded and followed by a letter
{
foreach (int i in indexes)
{
if (!Char.IsLetter(text[i - 1]) || !Char.IsLetter(text[i + 1]))
{
temp = false;
break;
}
}
}
return temp;
}
The following will match a string with one alphabetic character before the "-" and one after:
[A-z]-[A-z]
You may need to first test whether there is "-" present if that is not always the case. Could do with more information about the possible string contents and exactly why you need to perform the test
(^.+)(\D+)(-)(\D+)(.+)
I have tested this for some examples here http://regexr.com/39vfq

How can I test whether a string contains only hex characters in C#?

I have a long string (8000 characters) that should contain only hexadecimal and newline characters.
What is the best way to validate / verify that the string does not contain invalid characters?
Valid characters are: 0 through 9 and A through F. Newlines should be acceptable.
I began with this code, but it does not work properly (i.e. fails to return false when a "G" is the first character):
public static bool VerifyHex(string _hex)
{
Regex r = new Regex(#"^[0-9A-F]+$", RegexOptions.Multiline);
return r.Match(_hex).Success;
}
Another option, if you fancy using LINQ instead of regular expressions:
public static bool IsHex(string text)
{
return text.All(IsHexChar);
}
private static bool IsHexCharOrNewLine(char c)
{
return (c >= '0' && c <= '9') ||
(c >= 'A' && c <= 'F') ||
(c >= 'a' && c <= 'f') ||
c == '\n'; // You may want to test for \r as well
}
Or:
public static bool IsHex(string text)
{
return text.All(c => "0123456789abcdefABCDEF\n".Contains(c));
}
I think a regex is probably a better option in this case, but I wanted to just mention LINQ for the sake of interest :)
You're misunderstanding the Multiline option:
Use multiline mode, where ^ and
$ match the beginning and end of each line (instead of the beginning
and end of the input string).
Change it to
static readonly Regex r = new Regex(#"^[0-9A-F\r\n]+$");
public static bool VerifyHex(string _hex)
{
return r.Match(_hex).Success;
}
There are already some great answers but no one has mentioned using the built in parsing which seems to be the most straight forward way:
public bool IsHexString(string hexString)
{
System.Globalization.CultureInfo provider = new System.Globalization.CultureInfo("en-US");
int output = 0;
return Int32.TryParse(hexString, System.Globalization.NumberStyles.HexNumber, provider, out output))
}

How do I verify that a string is in English?

I read a string from the console. How do I make sure it only contains English characters and digits?
Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$
For example:
if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }
The benefit of regular expressions in this case is that all you really care about is whether or not a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.
There's a decent series of articles here that teach more about regular expressions.
Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.
You could match it against this regular expression: ^[a-zA-Z0-9]*$
^ matches the start of the string (ie no characters are allowed before this point)
[a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
* lets the previous match repeat zero or more times
$ matches the end of the string (ie no characters are allowed after this point)
To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:
bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");
If you are going to test a lot of lines against the pattern, you might want to compile the expression:
Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);
for (int i = 0; i < 1000; i++)
{
string input = Console.ReadLine();
pattern.IsMatch(input);
}
The accepted answer does not work for the white spaces or punctuation. Below code is tested for this input:
Hello: 1. - a; b/c \ _(5)??
(Is English)
Regex regex = new Regex("^[a-zA-Z0-9. -_?]*$");
string text1 = "سلام";
bool fls = regex.IsMatch(text1); //false
string text2 = "123 abc! ?? -_)(/\\;:";
bool tru = regex.IsMatch(text2); //true
One other way is to check if IsLower and IsUpper both doesn't return true.
Something like :
private bool IsAllCharEnglish(string Input)
{
foreach (var item in Input.ToCharArray())
{
if (!char.IsLower(item) && !char.IsUpper(item) && !char.IsDigit(item) && !char.IsWhiteSpace(item))
{
return false;
}
}
return true;
}
and for use it :
string str = "فارسی abc";
IsAllCharEnglish(str); // return false
str = "These are english 123";
IsAllCharEnglish(str); // return true
Do not use RegEx and LINQ they are slower than the loop by characters of string
Performance test
My solution:
private static bool is_only_eng_letters_and_digits(string str)
{
foreach (char ch in str)
{
if (!(ch >= 'A' && ch <= 'Z') && !(ch >= 'a' && ch <= 'z') && !(ch >= '0' && ch <= '9'))
{
return false;
}
}
return true;
}
do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it.
google language api
bool onlyEnglishCharacters = !EnglishText.Any(a => a > '~');
Seems cheap, but it worked for me, legit easy answer.
Hope it helps anyone.
bool AllAscii(string str)
{
return !str.Any(c => !Char.IsLetterOrDigit(c));
}
Something like this (if you want to control input):
static string ReadLettersAndDigits() {
StringBuilder sb = new StringBuilder();
ConsoleKeyInfo keyInfo;
while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
char c = char.ToLower(keyInfo.KeyChar);
if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
sb.Append(keyInfo.KeyChar);
Console.Write(c);
}
}
return sb.ToString();
}
If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):
foreach (char ch in str.ToCharArray())
{
int x = (int)char;
if (x >= 63 and x <= 126)
{
//this is english letter, i.e.- A, B, C, a, b, c...
}
else if(x >= 48 and x <= 57)
{
//this is number
}
else
{
//this is something diffrent
}
}
http://en.wikipedia.org/wiki/ASCII for full ASCII table.
But I still think, RegEx is the best solution.
I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.
More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8
As many pointed out, accepted answer works only if there is a single word in the string. As there are no answers that cover the case of multiple words or even sentences in the string, here is the code:
stringToCheck.Any(x=> char.IsLetter(x) && !((int)x >= 63 && (int)x <= 126));
<?php
$string="हिन्दी";
$string="Manvendra Rajpurohit";
echo strlen($string); echo '<br>';
echo mb_strlen($string, 'utf-8');
echo '<br>';
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else {
echo "OK, English Detected!";
}
?>

Categories

Resources