How do I simplify this pattern matching logic using C# Regex?

How do I simplify this pattern matching logic using C# Regex? - c#

Good morning! Hoping someone can help me out here with some pattern matching.
What I want to do is match a string of numbers against a bunch of text. The only catch is that I DO NOT want to match anything that has more numbers to the left and/or right of the number I'm looking for (letters are fine).
Here is some code that works, but it seems that having three IsMatch calls is overkill. Problem is, I can't figure out how to reduce it to just one IsMatch call.
static void Main(string[] args)
{
List<string> list = new List<string>();
list.Add("cm1312nfi"); // WANT
list.Add("cm1312"); // WANT
list.Add("cm1312n"); // WANT
list.Add("1312"); // WANT
list.Add("13123456"); // DON'T WANT
list.Add("56781312"); // DON'T WANT
list.Add("56781312444"); // DON'T WANT
list.Add(" cm1312nfi "); // WANT
list.Add(" cm1312 "); // WANT
list.Add("cm1312n "); // WANT
list.Add(" 1312"); // WANT
list.Add(" 13123456"); // DON'T WANT
list.Add(" 56781312 "); // DON'T WANT
foreach (string s in list)
{
// Can we reduce this to just one IsMatch() call???
if (s.Contains("1312") && !(Regex.IsMatch(s, #"\b[0-9]+1312[0-9]+\b") || Regex.IsMatch(s, #"\b[0-9]+1312\b") || Regex.IsMatch(s, #"\b1312[0-9]+\b")))
{
Console.WriteLine("'{0}' is a match for '1312'", s);
}
else
{
Console.WriteLine("'{0}' is NOT a match for '1312'", s);
}
}
}
Thank you in advance for any help you can provide!
~Mr. Spock

You can use negative lookarounds for a single check:
#"(?<![0-9])1312(?![0-9])"
(?<![0-9]) makes sure that 1312 doesn't have a digit before it, (?![0-9]) makes sure there's no digit after 1312.

You can make the character classes optional matches:
if (s.Contains("1312") && !Regex.IsMatch(s, #"\b[0-9]*1312[0-9]*\b"))
{
....
Have a look on the amazing Regexplained: http://tinyurl.com/q62uqr3

To catch invalid patterns use:
Regex.IsMatch(s, #"\b[0-9]*1312[0-9]*\b")
Also [0-9] can be replaced with \d

You can select only those letters before, after, or none at all
#"\b[a-z|A-Z]*1312[a-z|A-Z]*\b"

For curious minds - Another approach at solving the above problem?
foreach (string s in list)
{
var rgx = new Regex("[^0-9]");
// Remove all characters other than digits
s=rgx.Replace(s,"");
// Can we reduce this to just one IsMatch() call???
if (s.Contains("1312") && CheckMatch(s))
{
Console.WriteLine("'{0}' is a match for '1312'", s);
}
else
{
Console.WriteLine("'{0}' is NOT a match for '1312'", s);
}
}
private static bool CheckMatch(string s)
{
var index = s.IndexOf("1312");
// Check if no. of characters to the left of '1312' is same as no. of characters to its right
if(index == s.SubString(index).Length()-4)
return true;
return false;
}
Considering not a match for "131213121312".

Related

Validating user only enters a one word answer and that the first letter is capitalized

I'm currently self-teaching myself C# and have had a pretty decent grasp on it, but I'm lost on how to validate that the user only enters one word answer and that it's capitalized otherwise I want to give them another chance to try.
This what I have so far:
static void Main(string[] args)
{
//assigned variable
string userInput;
//intializing empty string
string answerInput = string.Empty;
//Creating loop
while ((answerInput == string.Empty) || (answerInput == "-1"))
{
//This is asking the question to the user
Console.WriteLine("Enter your favorite animal: ");
//this is storing users input
userInput = Console.ReadLine();
//using function to validate response
answerInput = letterFunc(userInput);
}
}
//Creating function to only allow letters and making sure it's not left blank.
private static string letterFunc (string validate)
{
//intializing empty string
string returnString = string.Empty;
//validating it is not left empty
if(validate.Length > 0)
{
//iterating through the passed string
foreach(char c in validate)
{
//using the asciitable to validate they only use A-Z, a-z, and space
if ((((Convert.ToInt32(c)) > 64) && ((Convert.ToInt32(c)) < 91)) || (((Convert.ToInt32(c)) > 96) && ((Convert.ToInt32(c)) < 123)) || (Convert.ToInt32(c) == 32))
{
//appensing sanitized character to return string
returnString += c;
}
else
{
//If they try to enter a number this will warn them
Console.WriteLine("Invalid input. Use letters only.");
}
}
}
else
{
//If user enters a blank input, this will warn them
Console.WriteLine("You cannot enter a blank response.");
}
//returning string
return returnString;
}
I was wondering if it's possible to do it inside the function I created to validate they only use letters and that it isn't empty with a detailed explaination. Thanks.

Regular expressions would be the standard way to do this, but looking at your code I don't think you're ready for them yet. This is not an insult by the way -- everyone was a beginner at some point!
Whenever you approach a problem like this, first make sure all your requirements are well-defined and specific:
One-word answer: In your code, you've defined it as "an answer that contains only letters and spaces". This might not be ideal, as it prevents people from entering answers like the dik-dik as their favorite animal. But let's stick with it for now.
Capitalized answer: Let's define this as "an answer where the first character is a capital letter".
So taking the two requirements together, we're trying to validate that the answer starts with a capital letter and contains only letters and spaces.
While coding, look at the language and framework you're using to see if there are convenience methods that can help you. .NET has tons of these. We know we'll have to check the individual characters of a String, and a string is made up of Chars, so let's google "c# char type". Looking at the MSDN page for System.Char, we can see a couple methods that might help us out:
Char.IsWhiteSpace - tests whether a character is 'whitespace' (space, tab, newline)
Char.IsLetter - tests whether a character is a letter.
Char.IsUpper - tests whether a character is an uppercase letter.
So let's look at our requirements again and implement them one at a time: "starts with a capital letter and contains only letters and spaces".
Let's call our user-input string answer. We can check that the first letter is a capital like this (note we also make sure it HAS a first letter):
bool isCapitalized = answer.Length > 0 && Char.IsUpper( answer[0] );
We can check that it contains only letters and spaces like this:
bool containsOnlyLettersAndSpaces = true;
foreach( char c in answer )
{
if( !Char.IsLetter( c ) && !Char.IsWhiteSpace( c ) )
{
containsOnlyLettersAndSpaces = false;
break;
}
}
containsOnlyLettersAndSpaces starts as true. We then look at each character at the string. If we find a character that is not a letter AND is not whitespace, we set containsOnlyLettersAndSpaces to false. Also if we find an invalid character, stop checking. We know the answer is invalid now, no reason to check the rest of it!
Now we can return our answer as a combination of the two validations:
return isCapitalized && containsOnlyLettersAndSpaces;
Here's the whole method:
private bool IsValidAnimal( string answer )
{
bool isCapitalized = answer.Length > 0 && Char.IsUpper( answer[0] );
bool containsOnlyLettersAndSpaces = true;
foreach( char c in answer )
{
if( !Char.IsLetter( c ) && !Char.IsWhiteSpace( c ) )
{
containsOnlyLettersAndSpaces = false;
break;
}
}
return isCapitalized && containsOnlyLettersAndSpaces;
}
Good luck with learning C#, and I hope this has helped you think about how to code!

Regular expressions are not that hard. Problem is that sometimes you want to achieve something more complex, but that's not your case:
private static string letterFunc (string validate)
{
return new Regex("^[A-Z][a-z]*$").IsMatch(validate) ?
validate :
string.Empty;
}
Explaining the expression:
^ - Anchor: only matches when the text begins with the expression
[A-Z] - Exactly one character, from A to Z
[a-z]* - Zero or more characters, from a to z
$ - Anchor: only matches when the text ends with the expression
By using the two anchors, we want the full text to match the expression, not parts of it.
If you want to allow capitals after the first letter (like CaT or DoG), you can change it to:
^[A-Z][a-zA-Z]*$
Play with Regex: https://regex101.com/r/zwLO6I/2

I figured it out. Thanks everyone for trying to help.
string answer;
while (true)
{
Console.WriteLine("Enter your favorite animal:");
answer = Console.ReadLine();
if (new Regex("^[A-Z][a-z]*$").IsMatch(answer))
{
Console.WriteLine("You like {0}s. Cool!", answer);
Console.ReadKey();
break;
}
else
{
Console.WriteLine("'{0}' is not a valid answer.", answer);
Console.WriteLine();
Console.WriteLine("Make sure:");
Console.WriteLine("You are entering a one word answer.");
Console.WriteLine("You are only using letters.");
Console.WriteLine("You are capitalizing the first letter of the word.");
Console.WriteLine();
Console.WriteLine("Try again:");
}
}

Regex to skip certain characters

I want to write a Regex which would skip characters like < & >. Reason
Now, to represent this I came across this [^<>] and tried using it in an console application, but it does not work.
[^<>]
Debuggex Demo
string value = "shubh<";
string regEx = "[^<>]";
Regex rx = new Regex(regEx);
if (rx.IsMatch(value))
{
Console.WriteLine("Pass");
}
else { Console.WriteLine("Fail"); }
Console.ReadLine();
The string 'shubh<' should get failed, but I am not sure why it passes the match. Am I doing something rubbish?

From Regex.IsMatch Method (String):
Indicates whether the regular expression specified in the Regex constructor finds a match in a specified input string.
[^<>] is found in shubh< (the s, the h, etc.).
You need to use the ^ and $ anchors:
Regex rx = new Regex("^[^<>]*$");
if (rx.IsMatch(value)) {
Console.WriteLine("Pass");
} else {
Console.WriteLine("Fail");
}
Another solution is to check if < or > is contained:
Regex rx = new Regex("[<>]");
if (rx.IsMatch(value)) {
Console.WriteLine("Fail");
} else {
Console.WriteLine("Pass");
}

Regex to match separate integer list in c#

I want to match comma separate integer list using Regex. I have used bellow pattern but it doesn't work for me.
if (!Regex.IsMatch(textBox_ImportRowsList.Text, #"^([0-9]+(,[0-9]+))*"))
{
errorProvider1.SetError(label_ListRowPosttext, "Row Count invalid!");
}
Valid Inputs:
1
1,2
1,4,6,10
Invalid Inputs:
1,
1.1
1,A
2,/,1
,1,3

use this regular expression:
^\d+(,\d+)*$

EDIT
Best way to validate your comma separated string is
string someString = "1,2,3";
bool myResults = someString.Split(';').
Any<string>(s => !isNumeric(s));
if(myResults)
Console.Writeln("invalid number");
else
Console.Writeln("valid number");
public bool isNumeric(string val)
{
if(val == String.Empty)
return false;
int result;
return int.TryParse(val,out result);
}
The following might also work for you. This regex will also capture an empty string.
^(\d+(,\d+)*)?$
or
^\d+(,\d+)*$
start with an integer, so '\d+'. That is 1 or more digit characters ('0'-'9')
Then make a set of parenthesis which contains ',\d+' and put an asterisk after it. allow the , and digit

You've got the asterisk in the wrong place. Instead of this:
#"^([0-9]+(,[0-9]+))*"
...use this:
#"^([0-9]+(,[0-9]+)*)"
Additionally, you should anchor the end like you did the beginning, and don't really need the outermost set of parentheses:
#"^[0-9]+(,[0-9]+)*$"

You could use ^\d+(,\d+)*$ but as #Lazarus has pointed out, this may be a case where regex is a bit of a overkill and string.Split() would be better utilized you could even use this with a int.tryParse if you are trying to manipulate numbers.

you can try with this code
List<string> list = new List<string>();
list.Add("1");
list.Add("1.1");
list.Add("1,A");
list.Add("2,/,1");
foreach (var item in list)
{
if (!Regex.IsMatch(item, #"^([0-9](,[0-9])*)$"))
{
Console.WriteLine("no match :" + item);
}
}

try this one
String strInput = textBox_ImportRowsList.Text;
foreach (String s in strInput.Split(new[]{',', ' '}, StringSplitOptions.RemoveEmptyEntries))
{
if(!Regex.IsMatch(s, #"^\d+(,\d+)*$"))
{
errorProvider1.SetError(label_ListRowPosttext, "Row Count invalid!");
}
}

Grouping Logical Operators (Multiple Sets of Conditions) in a do {} while () loop?

I currently have the following loop:
do {
checkedWord = articleWords.Dequeue().TrimEnd('?', '.', ',', '!');
correct = _spellChecker.CheckWord(checkedWord);
} while (correct && articleWords.Count > 0);
I am queuing up the words from an array that was split from a textbox with ' ' as the separator. The loop works fine, except for I don't want any blank entries "" or really anything non alpha-numeric to stop the loop. Currently if there's more than one space between the words then the loop ends and it continues on to get word suggestions from the spellchecker.
If I do while (correct && articleWords.Count > 0 || checkedWord == ""); then it'll skip any blank queue entries, but it still hangs up on things like new lines - so if the textbox that it loads from contains a couple of paragraphs it screws up at the newline separating the two. I've also tried while (correct && articleWords.Count > 0 || !Char.IsLetterOrDigit(checkedWord, 0)); but that also doesn't work.
Question 1: Can you group conditions like (statement1 == true && count > 0) || (statement1 == false && Char.IsLetterOrDigit(char))? - Meaning that all of the conditions in the first grouping must be met OR all of the conditions in the second set must be.
Question 2: I want my loop to continue progressing until an actual spelling error is found, and for it to ignore things like empty queue entries, as well as anything that's not an alpha-numeric character at the beginning of the string.
I suspect that I'm close with the Char.IsLetterOrDigit bit, but have to figure out how to do it correctly.
Let me know if more info is required.
Thanks!

You should not use composite loop condition, a good practice is usage while loop with easy general condition and 'break' in loop body when you should leave it.
You can use some thing like this:
public void Test()
{
var separators = new[] { ' ', '\t', '\r', '\x00a0', '\x0085', '?', ',', '.', '!' };
var input = "Test string, news112! news \n next, line! error in error word";
var tokens = new Queue<string>(input.Split(separators, StringSplitOptions.RemoveEmptyEntries));
string currentWord = null;
while (tokens.Any())
{
currentWord = tokens.Dequeue();
if (currentWord.All(c => Char.IsLetterOrDigit(c)))
{
if (!CheckSpell(currentWord))
{
break;
}
}
}
}
public bool CheckSpell(string word)
{
return word != null && word.Length > 0 && word[0] != 'e';
}

If your goal is to find the first error, you can skip the while loop and do the following:
var firstError = tokens.Where(t => t.All(Char.IsLetterOrDigit) && !_spellChecker.CheckWord(t)).FirstOrDefault();

So long as you have a valid boolean expression, you can do that without an issue. This is something that is very easy to test.
To use Char.IsLetterOrDigit you will need to loop over each character in the string and test it.

When it comes to text selections you should use regular expressions. It is very powerfull and fast framework for text queries. It is capable of doing it's job with O(n) complexity. It helps you because you don't have to think how you will select your text values you just specify what you need
Try this code. The pattern part #"\w+" means that I want to select all groups of alphanumeric symbols whose length > 1. If I'd wanted to select all words that start with letter 't' than I would write #"t\w+".
using System;
using System.Text;
using System.Text.RegularExpressions;
namespace Test
{
class Program
{
static void Main(string[] args)
{
string str = "The quick brown fox jumps over the lazy dog";
Regex regex = new Regex(#"\w+", RegexOptions.Compiled);
for (Match match = regex.Match(str); match.Success; match = match.NextMatch())
{
Console.WriteLine(match.Value);
}
}
}
}

Q1. Absolutely. Just make sure your groupings are properly separated.
Q2. For performance and clarity, I would use REGEX to clean your input before queuing:
using System.Text.RegularExpressions;
...
string input = GetArticle(); // or however you get your input
input = Regex.Replace(input, #"[^0-9\w\s]", string.Empty);
// not sure what your separators but you can always
// reduce multiple spaces to a single space and just split
// on the single space
var articleWords = new Queue<string>(input.Split( ... ));
do {
checkedWord = articleWords.Dequeue();
// put your conditional grouping here if you want
if(!_spellChecker.CheckWord(checkedWord)) {
// only update "correct" if necessary - writes are more expensive =)
// although with the "break" below you shouldn't need "correct" anymore
// correct = false;
// in case you want to raise an event - it's cleaner =)
OnSpellingError(checkWord);
// if you want to stop looping because of the error
break;
}
}
while(articleWords.Count > 0);
I wouldn't use Char.IsLetterOrDigit as I think that would be slower...besides, with the REGEX in the beginning you should have taken care of the entries that aren't characters or digits.
EDIT Adding LINQ response
On second thought, I think you're just trying to find spelling errors so how about this?
using System.Text.RegularExpressions;
...
string input = GetArticle(); // or however you get your input
// clean up words from punctuation
input = Regex.Replace(input, #"[^0-9\w\s]", string.Empty);
// trim whitespace
input = Regex.Replace(c, #"\s+", #" ");
var errors = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).All(w => !_spellChecker.CheckWord(w));
Then you can just do whatever you want with the errors =) Or you can just use a .Any if you just want to know whether there exists a spelling mistake at all.

Q1: Sure, it's possible to group conditions like that.
Q2: What about something like this?
string[] articleWords = textBoxText.Split(' ');
articleWords = articleWords.Select(a => a.Trim()).ToArray(); // remove all whitespaces (blanks, linebreaks)
articleWords = articleWords.Where(a => !string.IsNullOrEmpty(a)).ToArray(); // remove empty strings
bool correct = false;
bool spellingErrorFound = false;
for (int i = 0; i < articleWords.Length; i++)
{
string checkedWord = articleWords[i].TrimEnd('?', '.', ',', '!');
correct = _spellChecker.CheckWord(checkedWord);
if (!correct)
spellingErrorFound = true;
}

Regular Expression To Split On Comma Except If Quoted

What is the regular expression to split on comma (,) except if surrounded by double quotes? For example:
max,emily,john = ["max", "emily", "john"]
BUT
max,"emily,kate",john = ["max", "emily,kate", "john"]
Looking to use in C#: Regex.Split(string, "PATTERN-HERE");
Thanks.

Situations like this often call for something other than regular expressions. They are nifty, but patterns for handling this kind of thing are more complicated than they are useful.
You might try something like this instead:
public static IEnumerable<string> SplitCSV(string csvString)
{
var sb = new StringBuilder();
bool quoted = false;
foreach (char c in csvString) {
if (quoted) {
if (c == '"')
quoted = false;
else
sb.Append(c);
} else {
if (c == '"') {
quoted = true;
} else if (c == ',') {
yield return sb.ToString();
sb.Length = 0;
} else {
sb.Append(c);
}
}
}
if (quoted)
throw new ArgumentException("csvString", "Unterminated quotation mark.");
yield return sb.ToString();
}
It probably needs a few tweaks to follow the CSV spec exactly, but the basic logic is sound.

This is a clear-cut case for a CSV parser, so you should be using .NET's own CSV parsing capabilities or cdhowie's solution.
Purely for your information and not intended as a workable solution, here's what contortions you'd have to go through using regular expressions with Regex.Split():
You could use the regex (please don't!)
(?<=^(?:[^"]*"[^"]*")*[^"]*) # assert that there is an even number of quotes before...
\s*,\s* # the comma to be split on...
(?=(?:[^"]*"[^"]*")*[^"]*$) # as well as after the comma.
if your quoted strings never contain escaped quotes, and you don't mind the quotes themselves becoming part of the match.
This is horribly inefficient, a pain to read and debug, works only in .NET, and it fails on escaped quotes (at least if you're not using "" to escape a single quote). Of course the regex could be modified to handle that as well, but then it's going to be perfectly ghastly.

A little late maybe but I hope I can help someone else
String[] cols = Regex.Split("max, emily, john", #"\s*,\s*");
foreach ( String s in cols ) {
Console.WriteLine(s);
}

Justin, resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc.
Here's our simple regex:
"[^"]*"|(,)
The left side of the alternation matches complete "quoted strings" tags. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left. We replace these commas with SplitHere, then we split on SplitHere.
This program shows how to use the regex (see the results at the bottom of the online demo):
using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{
static void Main() {
string s1 = #"max,""emily,kate"",john";
var myRegex = new Regex(#"""[^""]*""|(,)");
string replaced = myRegex.Replace(s1, delegate(Match m) {
if (m.Groups[1].Value == "") return m.Value;
else return "SplitHere";
});
string[] splits = Regex.Split(replaced,"SplitHere");
foreach (string split in splits) Console.WriteLine(split);
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I simplify this pattern matching logic using C# Regex? - c#

You can use negative lookarounds for a single check: #"(?<![0-9])1312(?![0-9])" (?<![0-9]) makes sure that 1312 doesn't have a digit before it, (?![0-9]) makes sure there's no digit after 1312.

You can make the character classes optional matches: if (s.Contains("1312") && !Regex.IsMatch(s, #"\b[0-9]1312[0-9]\b")) { .... Have a look on the amazing Regexplained: http://tinyurl.com/q62uqr3

To catch invalid patterns use: Regex.IsMatch(s, #"\b[0-9]1312[0-9]\b") Also [0-9] can be replaced with \d

You can select only those letters before, after, or none at all #"\b[a-z|A-Z]1312[a-z|A-Z]\b"

Related

Validating user only enters a one word answer and that the first letter is capitalized

Regex to skip certain characters

Regex to match separate integer list in c#

Grouping Logical Operators (Multiple Sets of Conditions) in a do {} while () loop?

Regular Expression To Split On Comma Except If Quoted

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I simplify this pattern matching logic using C# Regex? - c#

You can use negative lookarounds for a single check: #"(?<![0-9])1312(?![0-9])" (?<![0-9]) makes sure that 1312 doesn't have a digit before it, (?![0-9]) makes sure there's no digit after 1312.

You can make the character classes optional matches: if (s.Contains("1312") && !Regex.IsMatch(s, #"\b[0-9]*1312[0-9]*\b")) { .... Have a look on the amazing Regexplained: http://tinyurl.com/q62uqr3

To catch invalid patterns use: Regex.IsMatch(s, #"\b[0-9]*1312[0-9]*\b") Also [0-9] can be replaced with \d

You can select only those letters before, after, or none at all #"\b[a-z|A-Z]*1312[a-z|A-Z]*\b"

Related

Validating user only enters a one word answer and that the first letter is capitalized

Regex to skip certain characters

Regex to match separate integer list in c#

Grouping Logical Operators (Multiple Sets of Conditions) in a do {} while () loop?

Regular Expression To Split On Comma Except If Quoted

Categories

Resources

You can make the character classes optional matches: if (s.Contains("1312") && !Regex.IsMatch(s, #"\b[0-9]1312[0-9]\b")) { .... Have a look on the amazing Regexplained: http://tinyurl.com/q62uqr3

To catch invalid patterns use: Regex.IsMatch(s, #"\b[0-9]1312[0-9]\b") Also [0-9] can be replaced with \d

You can select only those letters before, after, or none at all #"\b[a-z|A-Z]1312[a-z|A-Z]\b"