I've looked around this site for a good PO Box regex and didn't find any that I liked or worked consistently, so I tried my hand at making my own... I feel pretty good about it, but I'm sure the kind folks here on SO can poke some holes in it :) So... what problems do you see with this and what false-positives/false-negatives can you think up that would get through?
One caveat that I can see is that the PO Box pattern has to be at the start of the string, but what else is wrong with it?
public bool AddressContainsPOB(string Addr)
{
string input = Addr.Trim().ToLower();
bool Result = false;
Regex regexObj1 = new Regex(#"^p(ost){0,1}(\.){0,1}(\s){0,2}o(ffice){0,1}(\.){0,1}((\s){1}|b{1}|[1-9]{1})");
Regex regexObj2 = new Regex(#"^pob((\s){1}|[0-9]{1})");
Regex regexObj3 = new Regex(#"^box((\s){1}|[0-9]{1})");
Match match1 = regexObj1.Match(input);
if (match1.Success)
{ Result = true; }
Match match2 = regexObj2.Match(input);
if (match2.Success)
{ Result = true; }
Match match3 = regexObj3.Match(input);
if (match3.Success)
{ Result = true; }
return Result;
}
What do you expect from us? You don't even give us valid/invalid strings. Have you tested your regexes somehow?
What I see at the first glance, without knowing something about valid input is:
One caveat that I can see is that the PO Box pattern has to be at the start of the string
Do you want to match it only at the start of the string or not? You need to know that and define it in your pattern. If you don't want to, then remove the start of the string anchor ^ and replace it with a word boundary \b.
{1} is superfluous, you can just remove it.
For {0,1} there is a shortform ?, I like this better, because it is shorter.
^box((\s){1}|[0-9]{1}) matches either "box" followed by a whitespace OR followed by a digit. Is this really what you want to match?
(\.) in the first regex: Why do you group a single dot?
Related
This is an extra exercise given to us on our Uni course, in which we need to find whether a sentence contains a palindrome or not. Whereas finding if a word is a palindrome or not is fairly easy, there could be a situation where the given sentence looks like this: "Dog cat - kajak house". My logic is to, using functions I already wrote, first determine if a character is a letter or not, if not delete it. Then count number of spaces+1 to find out how many words there are in a sentence, prepare an array of those words and then cast a function that checks if a word is palindrome on every element of an array. However, the double space would mess everything up on a "counting" phase. I've spent around an hour fiddling with code to do this, however I can't wrap my head around this. Could anyone help me? Note that I'm not supposed to use any external methods or libraries. I've done this using RegEx and it was fairly easy, however I'd like to do this "legally". Thanks in advance!
Just split on space, the trick is to remove empties. Google StringSplitOptions.RemoveEmptyEntries
then obviously join with one clean space
You could copy all the characters you are interested in to a new string:
var copy = new StringBuilder();
// Indicates whether the previous character was whitespace
var whitespace = true;
foreach (var character in originalString)
{
if (char.IsLetter(character))
{
copy.Append(character);
whitespace = false;
}
else if (char.IsWhiteSpace(character) && !whitespace)
{
copy.Append(character);
whitespace = true;
}
else
{
// Ignore other characters
}
}
If you're looking for the most rudimentary/olde-worlde option, this will work. But no-one would actually do this, other than in an illustrative manner.
string test = "Dog cat kajak house"; // Your string minus any non-letters
int prevLen;
do
{
prevLen = test.Length;
test = test.Replace(" ", " "); // Repeat-replace double spaces until none left
} while (prevLen > test.Length);
Personally, I'd probably do the following to end up with an array of words to check:
string[] words = test.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
I'm new to using Regex, I've been going through a rake of tutorials but I haven't found one that applies to what I want to do,
I want to search for something, but return everything following it but not the search string itself
e.g. "Some lame sentence that is awesome"
search for "sentence"
return "that is awesome"
Any help would be much appreciated
This is my regex so far
sentence(.*)
but it returns: sentence that is awesome
Pattern pattern = Pattern.compile("sentence(.*)");
Matcher matcher = pattern.matcher("some lame sentence that is awesome");
boolean found = false;
while (matcher.find())
{
System.out.println("I found the text: " + matcher.group().toString());
found = true;
}
if (!found)
{
System.out.println("I didn't find the text");
}
You can do this with "just the regular expression" as you asked for in a comment:
(?<=sentence).*
(?<=sentence) is a positive lookbehind assertion. This matches at a certain position in the string, namely at a position right after the text sentence without making that text itself part of the match. Consequently, (?<=sentence).* will match any text after sentence.
This is quite a nice feature of regex. However, in Java this will only work for finite-length subexpressions, i. e. (?<=sentence|word|(foo){1,4}) is legal, but (?<=sentence\s*) isn't.
Your regex "sentence(.*)" is right. To retrieve the contents of the group in parenthesis, you would call:
Pattern p = Pattern.compile( "sentence(.*)" );
Matcher m = p.matcher( "some lame sentence that is awesome" );
if ( m.find() ) {
String s = m.group(1); // " that is awesome"
}
Note the use of m.find() in this case (attempts to find anywhere on the string) and not m.matches() (would fail because of the prefix "some lame"; in this case the regex would need to be ".*sentence(.*)")
if Matcher is initialized with str, after the match, you can get the part after the match with
str.substring(matcher.end())
Sample Code:
final String str = "Some lame sentence that is awesome";
final Matcher matcher = Pattern.compile("sentence").matcher(str);
if(matcher.find()){
System.out.println(str.substring(matcher.end()).trim());
}
Output:
that is awesome
You need to use the group(int) of your matcher - group(0) is the entire match, and group(1) is the first group you marked. In the example you specify, group(1) is what comes after "sentence".
You just need to put "group(1)" instead of "group()" in the following line and the return will be the one you expected:
System.out.println("I found the text: " + matcher.group(**1**).toString());
I am trying to see if my string starts with a string in an array of strings I've created. Here is my code:
string x = "Table a";
string y = "a table";
string[] arr = new string["table", "chair", "plate"]
if (arr.Contains(x.ToLower())){
// this should be true
}
if (arr.Contains(y.ToLower())){
// this should be false
}
How can I make it so my if statement comes up true? Id like to just match the beginning of string x to the contents of the array while ignoring the case and the following characters. I thought I needed regex to do this but I could be mistaken. I'm a bit of a newbie with regex.
It seems you want to check if your string contains an element from your list, so this should be what you are looking for:
if (arr.Any(c => x.ToLower().Contains(c)))
Or simpler:
if (arr.Any(x.ToLower().Contains))
Or based on your comments you may use this:
if (arr.Any(x.ToLower().Split(' ')[0].Contains))
Because you said you want regex...
you can set a regex to var regex = new Regex("(table|plate|fork)");
and check for if(regex.IsMatch(myString)) { ... }
but it for the issue at hand, you dont have to use Regex, as you are searching for an exact substring... you can use
(as #S.Akbari mentioned : if (arr.Any(c => x.ToLower().Contains(c))) { ... }
Enumerable.Contains matches exact values (and there is no build in compare that checks for "starts with"), you need Any that takes predicate that takes each array element as parameter and perform the check. So first step is you want "contains" to be other way around - given string to contain element from array like:
var myString = "some string"
if (arr.Any(arrayItem => myString.Contains(arrayItem)))...
Now you actually asking for "string starts with given word" and not just contains - so you obviously need StartsWith (which conveniently allows to specify case sensitivity unlike Contains - Case insensitive 'Contains(string)'):
if (arr.Any(arrayItem => myString.StartsWith(
arrayItem, StringComparison.CurrentCultureIgnoreCase))) ...
Note that this code will accept "tableAAA bob" - if you really need to break on word boundary regular expression may be better choice. Building regular expressions dynamically is trivial as long as you properly escape all the values.
Regex should be
beginning of string - ^
properly escaped word you are searching for - Escape Special Character in Regex
word break - \b
if (arr.Any(arrayItem => Regex.Match(myString,
String.Format(#"^{0}\b", Regex.Escape(arrayItem)),
RegexOptions.IgnoreCase)) ...
you can do something like below using TypeScript. Instead of Starts with you can also use contains or equals etc..
public namesList: Array<string> = ['name1','name2','name3','name4','name5'];
// SomeString = 'name1, Hello there';
private isNamePresent(SomeString : string):boolean{
if (this.namesList.find(name => SomeString.startsWith(name)))
return true;
return false;
}
I think I understand what you are trying to say here, although there are still some ambiguity. Are you trying to see if 1 word in your String (which is a sentence) exists in your array?
#Amy is correct, this might not have to do with Regex at all.
I think this segment of code will do what you want in Java (which can easily be translated to C#):
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
foreach(string word in words){
foreach(string element in arr){
if(element.Equals(word)){
return true;
}
}
}
return false;
You can also use a Set to store the elements in your array, which can make look up more efficient.
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
HashSet<string> set = new HashSet<string>(arr);
for(string word : words){
if(set.contains(word)){
return true;
}
}
return false;
Edit: (12/22, 11:05am)
I rewrote my solution in C#, thanks to reminders by #Amy and #JohnyL. Since the author only wants to match the first word of the string, this edited code should work :)
C#:
static bool contains(){
x = x.ToLower();
string[] words = x.Split(" ");
var set = new HashSet<string>(arr);
if(set.Contains(words[0])){
return true;
}
return false;
}
Sorry my question was so vague but here is the solution thanks to some help from a few people that answered.
var regex = new Regex("^(table|chair|plate) *.*");
if (regex.IsMatch(x.ToLower())){}
I am studying regex, but still find hard to learn.
So my problem is this, I have given a set of keywords:
The quick brown fox
where I have to find in bunch of sentences like:
the Brown SexyFox Jumps soQuickly in the backyard...
If there is any match with these words (not Casesensitive):
The, the, brown, Brown, fox, Fox, quick, Quick
Then I can say that return value is true
How to do it in regex? I was thinking to split the words and put in Array and use loop and find them using .Contains(...) but I know that is not ideal.
Actually I have another concern. But I'm afraid to post it as a new question.
So my second question is, how does regex read the pattern? What are the priorities and least priorities?
Anyway please help me with my problem.
EDIT
Sorry for the late response, but the solution of #PatrikW seems not to work.
I have static class:
public static bool ValidateRegex(string value, string regex)
{
value += ""; // Fail safe for null
Regex obj = new Regex(regex, RegexOptions.IgnoreCase);
if (value.Trim() == "")
return false;
else
{
return obj.IsMatch(value);
}
}
Construct regex pattern:
keyword = "maria";
string regexPattern = "(?<=\b)(";
string Or = string.Empty;
foreach (string item in keyword.Split(new char[] { ' ', ',', '.' }, StringSplitOptions.RemoveEmptyEntries).ToList())
{
regexPattern += Or + "(" + item + ")";
Or = "|";
}
regexPattern += ")(?=\b)";
Data information:
List<Friend> useritems = null;
useritems = ((List<Friend>)SessonHandler.Data.FriendList).Where(i =>
Utility.ValidateRegex(i.LastName, regexPattern) ||
Utility.ValidateRegex(i.FirstName, regexPattern) ||
Utility.ValidateRegex(i.MiddleName, regexPattern)).ToList();
//regexPattern = "(?<=\b)((maria))(?=\b)"
//LastName = "MARIA CALIBRI"
//FirstName = "ALICE"
//MiddleName = null
May be I did something wrong with the code. Please help.
EDIT 2
I forgot the # sign. This must work now:
string regexPattern = #"(?<=\b)(";
.
.
.
regexPattern += #")(?=\b)";
The answer below is correct.
What Felice showed is the more dynamic solution, but here's a pattern for finding the exact keywords you've got:
"(?<=\b)((The)|(quick)|(brown)|(fox))(?=\b)"
Because of the leading and trailing capturing groups, it will only match whole words and not parts of them.
Here's an example:
Regex foxey = new Regex(#"(?<=\b)((The)|(quick)|(brown)|(fox))(?=\b)");
foxey.Options = RegexOptions.IgnoreCase;
bool doesMatch = foxey.IsMatching("the Brown SexyFox Jumps soQuickly in the backyard...");
Edit - Regex engine:
Simply put, the Regex-engine walks through the input-string one character at a time, starting at the leftmost one, checking it against the first part of the regex-pattern we've written. If it matches, the parser moves to the next character and checks it against the next part of the pattern. If it manages to successfully walk through the whole pattern, that is a match.
You can read about how the internals of regex works just by searching for "regex engine" or something along those lines. Here's a pick:
http://www.regular-expressions.info/engine.html
I am trying to substrings if they have certain format. Substring Regex query is [CENAOD(xyx)]. I have done following code but when running this in cycle it says all results match which is wrong. Where I've done something wrong?
string strRegex = #"(\[CENAOD\((\S|\W)*\)\])*";
string strCenaOd = sReader["intro"].ToString()
if (Regex.IsMatch(strCenaOd, strRegex, RegexOptions.IgnoreCase))
{
string = (want to read content of ( ) = xyz in example)
}
Remove the outer ( ... )*.
That says no match is a good match too.
Or use + instead of *.
Adding to #Kent's and #leppie's answers, the code surrounding the regex needs work, too. I think this is what you were trying for:
string strRegex = #"\[CENAOD\(([^)]*)\)\]";
string strCenaOd = sReader["intro"].ToString();
Match m = Regex.Match(strCenaOd, strRegex, RegexOptions.IgnoreCase);
if (m.Success)
{
string content = m.Groups[1];
// ...
}
IsMatch() is a simple yes-or-no check, it doesn't provide any way to retrieve the matched text.
I especially want to comment on (\S|\W)*, from your regex. First, \S|\W is a very inefficient way to match any character. . is usually all you need, but as Kent pointed out, [^)] (i.e., any character except )) is more appropriate in this case. Also, by placing the * outside the round brackets, you'll only ever capture the last character. ([^)]*) captures all of them. For more details, read this.
if you said "all strings", how about:
\[CENAOD\([^\)]*\)\]