Pattern matching problem in C#

Pattern matching problem in C# - c#

I have a string like "AAA 101 B202 C 303 " and I want to get rid of the space between char and number if there is any.
So after operation, the string should be like "AAA101 B202 C303 ". But I am not sure whether regex could do this?
Any help? Thanks in advance.

Yes, you can do this with regular expressions. Here's a short but complete example:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
string text = "A 101 B202 C 303 ";
string output = Regex.Replace(text, #"(\p{L}) (\d)", #"$1$2");
Console.WriteLine(output); // Prints A101 B202 C303
}
}
(If you're going to do this a lot, you may well want to compile a regular expression for the pattern.)
The \p{L} matches any unicode letter - you may want to be more restrictive.

You can do something like
([A-Z]+)\s?(\d+)
And replace with
$1$2
The expression can be tightened up, but the above should work for your example input string.
What it does is declaring a group containing letters (first set of parantheses), then an optional space (\s?), and then a group of digits (\d+). The groups can be used in the replacement by referring to their index, so when you want to get rid of the space, just replace with $1$2.

While not as concise as Regex, the C# code for something like this is fairly straightforward and very fast-running:
StringBuilder sb = new StringBuilder();
for(int i=0; i<s.Length; i++)
{
// exclude spaces preceeded by a letter and succeeded by a number
if(!(s[i] == ' '
&& i-1 >= 0 && IsLetter(s[i-1])
&& i+1 < s.Length && IsNumber(s[i+1])))
{
sb.Append(s[i]);
}
}
return sb.ToString();

Just for fun (because the act of programming is/should be fun sometimes) :o) I'm using LINQ with Aggregate:
var result = text.Aggregate(
string.Empty,
(acc, c) => char.IsLetter(acc.LastOrDefault()) && Char.IsDigit(c) ?
acc + c.ToString() : acc + (char.IsWhiteSpace(c) && char.IsLetter(acc.LastOrDefault()) ?
string.Empty : c.ToString())).TrimEnd();

Related

Remove anything from string after any "a-zA-Z" char

I have this types of string:
"10a10", "10b5641", "5a1121", "438z2a5f"
and I need to remove anything after the FIRST a-zA-Z char in the string (the symbol itself should be removed as well). What could be a solution?
Examples of results I expect:
"10a10" returns "10"
"10b5641" returns "10"
"5a1121" returns "5"
"438z2a5f" returns "438"

You could use Regular Expressions along with Regex, something like:
string str = "10a10";
str = Regex.Replace(str, #"[a-zA-Z].*", "");
Console.WriteLine(str);
will output:
10
Basically it will takes everything that starts with a-zA-Z and everything after it (.* matches any characters zero or unlimited times) and remove it from the string.

An easy to understand approach would be to use the String.IndexOfAny Method to find the Index of the first a-zA-Z char, and then use the String.Substring Method to cut the string accordingly.
To do so you would create an array containing all a-zA-Z characters and use this as an argument to String.IndexOfAny. After that you use 0 and the result of String.IndexOfAny as arguments for String.Substring.
I am pretty sure there are more elegant ways to do this, but this seems the most basic approach to me, so its worth mentioning.

You could do so using Linq as follows.
var result = new string(strInput.TakeWhile(x => !char.IsLetter(x)).ToArray());

var sList = new List<string> { "10a10", "10b5641", "5a1121", "438z2a5f" };
foreach (string s in sList.ToArray())
{
string number = new string(s.TakeWhile(c => !Char.IsLetter(c)).ToArray());
Console.WriteLine(number);
}

Either Linq:
var result = string.Concat(strInput
.TakeWhile(c => !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')));
Or regular expression:
using System.Text.RegularExpressions;
...
var result = Regex.Match(strInput, "^[^A-Za-z]*").Value;
In both cases starting from strInput beginning take characters until a..z or A-Z occurred
Demo:
string[] tests = new[] {
"10a10", "10b5641", "5a1121", "438z2a5f"
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-10} returns \"{Regex.Match(test, "^[^A-Za-z]*").Value}\""));
Console.Write(demo);
Outcome:
10a10 returns "10"
10b5641 returns "10"
5a1121 returns "5"
438z2a5f returns "438"

Get a number and string from string

I have a kinda simple problem, but I want to solve it in the best way possible. Basically, I have a string in this kind of format: <some letters><some numbers>, i.e. q1 or qwe12. What I want to do is get two strings from that (then I can convert the number part to an integer, or not, whatever). The first one being the "string part" of the given string, so i.e. qwe and the second one would be the "number part", so 12. And there won't be a situation where the numbers and letters are being mixed up, like qw1e2.
Of course, I know, that I can use a StringBuilder and then go with a for loop and check every character if it is a digit or a letter. Easy. But I think it is not a really clear solution, so I am asking you is there a way, a built-in method or something like this, to do this in 1-3 lines? Or just without using a loop?

You can use a regular expression with named groups to identify the different parts of the string you are interested in.
For example:
string input = "qew123";
var match = Regex.Match(input, "(?<letters>[a-zA-Z]+)(?<numbers>[0-9]+)");
if (match.Success)
{
Console.WriteLine(match.Groups["letters"]);
Console.WriteLine(match.Groups["numbers"]);
}

You can try Linq as an alternative to regular expressions:
string source = "qwe12";
string letters = string.Concat(source.TakeWhile(c => c < '0' || c > '9'));
string digits = string.Concat(source.SkipWhile(c => c < '0' || c > '9'));

You can use the Where() extension method from System.Linq library (https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where), to filter only chars that are digit (number), and convert the resulting IEnumerable that contains all the digits to an array of chars, that can be used to create a new string:
string source = "qwe12";
string stringPart = new string(source.Where(c => !Char.IsDigit(c)).ToArray());
string numberPart = new string(source.Where(Char.IsDigit).ToArray());
MessageBox.Show($"String part: '{stringPart}', Number part: '{numberPart}'");
Source:
https://stackoverflow.com/a/15669520/8133067

if possible add a space between the letters and numbers (q 3, zet 64 etc.) and use string.split
otherwise, use the for loop, it isn't that hard

You can test as part of an aggregation:
var z = "qwe12345";
var b = z.Aggregate(new []{"", ""}, (acc, s) => {
if (Char.IsDigit(s)) {
acc[1] += s;
} else {
acc[0] += s;
}
return acc;
});
Assert.Equal(new [] {"qwe", "12345"}, b);

Using regex on a specific setup

I know a bit about regular expressions, but far from enough to figure out this one.
I have tried to see if I could find something that could help me, but I got a hard time understanding how to construct the REGEX expression in c#.
Here is what I need.If I have a string like the following.
string s = "this is (a (string))"
What I need is to focus on the parentheses.
I want to be able to split this string up into the following List/Array "parts".
1) "this", "is", "a (string)"
or
2) "this", "is", "(a (string))".
would both like how to do it with 1) and 2). Anyone got an idea of how to solve this problem?
Can this be solved using REGEX? Anyone knows a good guide to learn about it?
Hope someone can help.
Greetings.

If you want to split with some kind of escape (do not count for space if it's within parentheses) you
can easily implement something like this, easy loop without regular expressions:
private static IEnumerable<String> SplitWithEscape(String source) {
if (String.IsNullOrEmpty(source))
yield break;
int escapeCount = 0;
int start = 0;
for (int i = 0; i < source.Length; ++i) {
char ch = source[i];
if (escapeCount > 0) {
if (ch == '(')
escapeCount += 1;
else if (ch == ')')
escapeCount -= 1;
}
else {
if (ch == ' ') {
yield return source.Substring(start, i - start);
start = i;
}
else if (ch == '(')
escapeCount += 1;
}
}
if ((start < source.Length - 1) && (escapeCount == 0))
yield return source.Substring(start);
}
....
String source = "this is (a (string))";
String[] split = SplitWithEscape(source).ToArray();
Console.Write(String.Join("; ", split));

You can try something like this:
([^\(\s]+)\s+([^\(\s]+)\s+\((.*)\)
Regex Demo
But this will only match with fixed number of words in your input string, in this case, two words before the parentheses. The final regex will depend on what are your specifications.

.NET regex supports balanced constructs. Thus, you can always safely use .NET regex to match substrings between a balanced number of delimiters that may have something inside them.
So, you can use
\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+
to match parenthesized substrings (while capturing the contents in-between parentheses into Group 1) or match all non-whitespace chunks (\S+ matches 1+ non-whitespace symbols).
See Grouping Constructs in Regular Expressions, Matching Nested Constructs with Balancing Groups or What are regular expression Balancing Groups? for more details on how balancing groups work.
Here is a regex demo
If you need to extract all the match values and captured values, you need to get all matched groups that are not empty or whitespace. So, use this C# code:
var line = "this is (a (string))";
var pattern = #"\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups.Cast<Group>()
.Where(m => !string.IsNullOrWhiteSpace(m.Value))
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);

Maybe you can use ((?<=\()[^}]*(?=\)))|\W+ to split in words and then get the content in the group 1...
See this Regex

get measurement value only from string

I have a string which gives the measurement followed the units in either cm, m or inches.
For example :
The number could be 112cm, 1.12m, 45inches or 45in.
I would like to extract only the number part of the string. Any idea how to use the units as the delimiters to extract the number ?
While I am at it, I would like to ignore the case of the units.
Thanks

You can try:
string numberMatch = Regex.Match(measurement, #"\d+\.?\d*").Value;
EDIT
Furthermore, converting this to a double is trivial:
double result;
if (double.TryParse(number, out result))
{
// Yeiiii I've got myself a double ...
}

Use String.Split http://msdn.microsoft.com/en-us/library/tabh47cf.aspx
Something like:
var units = new[] {"cm", "inches", "in", "m"};
var splitnumber = mynumberstring.Split(units, StringSplitOptions.RemoveEmptyEntries);
var number = Convert.ToInt32(splitnumber[0]);

Using Regex this can help you out:
(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)
Break up:
(?i) = ignores characters case // specify it in C#, live do not have it
\d+(\.\d+)? = supports numbers like 2, 2.25 etc
(?=c?m|in(ch(es)?)?) = positive lookahead, check units after the number if they are
m, cm,in,inch,inches, it allows otherwise it is not.
?: = specifies that the group will not capture
? = specifies the preceding character or group is optional
Demo
EDIT
Sample code:
MatchCollection mcol = Regex.Matches(sampleStr,#"(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)")
foreach(Match m in mcol)
{
Debug.Print(m.ToString()); // see output window
}

I guess I'd try to replace with "" every character that is not number or ".":
//s is the string you need to convert
string tmp=s;
foreach (char c in s.ToCharArray())
{
if (!(c >= '0' && c <= '9') && !(c =='.'))
tmp = tmp.Replace(c.ToString(), "");
}
s=tmp;

Try using regular expression \d+ to find an integer number.
resultString = Regex.Match(measurementunit , #"\d+").Value;

Is it a requirement that you use the unit as the delimiter? If not, you could extract the number using regex (see Find and extract a number from a string).

How do I verify that a string is in English?

I read a string from the console. How do I make sure it only contains English characters and digits?

Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$
For example:
if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }
The benefit of regular expressions in this case is that all you really care about is whether or not a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.
There's a decent series of articles here that teach more about regular expressions.
Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.

You could match it against this regular expression: ^[a-zA-Z0-9]*$
^ matches the start of the string (ie no characters are allowed before this point)
[a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
* lets the previous match repeat zero or more times
$ matches the end of the string (ie no characters are allowed after this point)
To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:
bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");
If you are going to test a lot of lines against the pattern, you might want to compile the expression:
Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);
for (int i = 0; i < 1000; i++)
{
string input = Console.ReadLine();
pattern.IsMatch(input);
}

The accepted answer does not work for the white spaces or punctuation. Below code is tested for this input:
Hello: 1. - a; b/c \ _(5)??
(Is English)
Regex regex = new Regex("^[a-zA-Z0-9. -_?]*$");
string text1 = "سلام";
bool fls = regex.IsMatch(text1); //false
string text2 = "123 abc! ?? -_)(/\\;:";
bool tru = regex.IsMatch(text2); //true

One other way is to check if IsLower and IsUpper both doesn't return true.
Something like :
private bool IsAllCharEnglish(string Input)
{
foreach (var item in Input.ToCharArray())
{
if (!char.IsLower(item) && !char.IsUpper(item) && !char.IsDigit(item) && !char.IsWhiteSpace(item))
{
return false;
}
}
return true;
}
and for use it :
string str = "فارسی abc";
IsAllCharEnglish(str); // return false
str = "These are english 123";
IsAllCharEnglish(str); // return true

Do not use RegEx and LINQ they are slower than the loop by characters of string
Performance test
My solution:
private static bool is_only_eng_letters_and_digits(string str)
{
foreach (char ch in str)
{
if (!(ch >= 'A' && ch <= 'Z') && !(ch >= 'a' && ch <= 'z') && !(ch >= '0' && ch <= '9'))
{
return false;
}
}
return true;
}

do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it.
google language api

bool onlyEnglishCharacters = !EnglishText.Any(a => a > '~');
Seems cheap, but it worked for me, legit easy answer.
Hope it helps anyone.

bool AllAscii(string str)
{
return !str.Any(c => !Char.IsLetterOrDigit(c));
}

Something like this (if you want to control input):
static string ReadLettersAndDigits() {
StringBuilder sb = new StringBuilder();
ConsoleKeyInfo keyInfo;
while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
char c = char.ToLower(keyInfo.KeyChar);
if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
sb.Append(keyInfo.KeyChar);
Console.Write(c);
}
}
return sb.ToString();
}

If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):
foreach (char ch in str.ToCharArray())
{
int x = (int)char;
if (x >= 63 and x <= 126)
{
//this is english letter, i.e.- A, B, C, a, b, c...
}
else if(x >= 48 and x <= 57)
{
//this is number
}
else
{
//this is something diffrent
}
}
http://en.wikipedia.org/wiki/ASCII for full ASCII table.
But I still think, RegEx is the best solution.

I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.
More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8

As many pointed out, accepted answer works only if there is a single word in the string. As there are no answers that cover the case of multiple words or even sentences in the string, here is the code:
stringToCheck.Any(x=> char.IsLetter(x) && !((int)x >= 63 && (int)x <= 126));

<?php
$string="हिन्दी";
$string="Manvendra Rajpurohit";
echo strlen($string); echo '<br>';
echo mb_strlen($string, 'utf-8');
echo '<br>';
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else {
echo "OK, English Detected!";
}
?>

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Pattern matching problem in C# - c#

I have a string like "AAA 101 B202 C 303 " and I want to get rid of the space between char and number if there is any. So after operation, the string should be like "AAA101 B202 C303 ". But I am not sure whether regex could do this? Any help? Thanks in advance.

Related

Remove anything from string after any "a-zA-Z" char

Get a number and string from string

Using regex on a specific setup

get measurement value only from string

How do I verify that a string is in English?

Categories

Resources